DupVer

Description

Dupver is a minimalist deduplicating version control system in Go based on the Restic chunking library. It is most similar to the binary version control system Boar https://bitbucket.org/mats_ekberg/boar/wiki/Home. Dupver does not track files, rather it stores snapshots more like a backup program. Rather than traverse directories itself, Dupver uses an (uncompressed) tar file as input. Not that only tar files are accepted as input as Dupver relies on the tar container to provide the list of files in the snapshot and store metadata such as file modification times and permissions Dupver uses a centralized repository to take advantage of deduplication between working directories. This means that dupver working directories can also be git repositories or subdirectories of git repositories. I mainly use it for version control of databases, but it is can also be used for sampled data.

Project Logs

Collapse

Version 2.0 released!
Kumar • 01/11/2022 at 20:20 • 0 comments

This version switches to distributed repos which means that deduplication is no longer possible between project folders, but simplifies implementation and folder syncing.
Update: Some thoughts on the future of version control
Kumar • 01/02/2021 at 14:06 • 0 comments
In response to a HN post, some thoughts about the future for version control. Full discussion is here: https://news.ycombinator.com/item?id=25535844

The state-of-the-art for backup is deduplicating software (Borg, Restic, Duplicacy). Gripes about Git's UI choices aside, Git was designed around human-readable text files and just doesn't do large binary files well. Sure, there's Git-LFS, but it sucks. The future of version control will:
1. Make use of deduplication to handle large binary files
2. Natively supports remotes via cloud storage
3. Doesn't keep state in the working directory so that projects can live in a Dropbox/OneDrive/iCloud folder without corrupting the repo
4. Is truly cross-platform with minimal POSIX dependencies. I love Linux, but I'm a practicing engineer, and the reality is that engineering software is a market where traditional Windows desktop software still rules.
Another thought I've been having for some time is if I could have gotten away with file level deduplication like Boar (or Git IIRC) does and drop compression. This would probably result in significant simplification, particularly for copying between repos. For most users this wouldn't impact disk space usage much as the bulk of files already have compression built in, and the trend seems to be increasingly to adopt compression in new file formats. This includes:
1. Audio/Image/Video files with (usually) lossy compression. This suprisingly (to me) also includes raster image editor file formats such as Paint.net's pdn, which wraps everything in a gzip stream.
2. MS office documents structured as a hierarchy of zipped .xml files. More recently, this format also includes Matlab's .slx Simulink file format and .mlx notebook format.
The gotcha to this is it's an 80% solution. There are still plenty of file formats that are uncompressed text, even newer ones such as JSON/YAML/TOML and a number of uncompressed binary file formats such as MessagePack, though most tend to be some sort of database such as the Geodatabase .gdb format which is based on Sqlite3 or PowerWorld's .pwb format. There is also the corner case of metadata in media files such as EXIF, which if modified would cause the whole file contents to be stored again. So I'm sticking with chunking for the time being.

This is all pretty opinionated, so feel free to prove me wrong. It wouldn't be Hackaday without internet arguments, right?

View all 2 project logs

Discussions

DupVer

Description

Details

Project Logs

Collapse

Version 2.0 released!

Update: Some thoughts on the future of version control

Discussions

Similar Projects

Augmented Reality and Dinosaur Movement Control

Linux mint 19.3

Ransomware safe server: SMB (Samba) and FTP server

Super Mario Augmented Reality

DupVer

Become a Hackaday.io member

Just one more thing

Description

Details

Project Logs Collapse

Version 2.0 released!

Update: Some thoughts on the future of version control

Enjoy this project?

Discussions

Become a Hackaday.io Member

Similar Projects

Augmented Reality and Dinosaur Movement Control

Linux mint 19.3

Ransomware safe server: SMB (Samba) and FTP server

Super Mario Augmented Reality

Does this project spark your interest?

Report project as inappropriate

Send message

Remove Member

Project Logs

Collapse