Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Dirvish [0] is worth looking at, light-weight and providing a good set of functionality (rotation, incremental backups, retention, pre/post scripts). It is a scripted wrapper around rsync [1] so you profit from all that functionality too (remote backups, compression for limited links, metadata/xattr support, various sync criteria, etc.)

This has been a lifesaver for 20+ years, thanks to JW Schultz!

The questions/topics in the article go really well along with it.

[0] https://dirvish.org/ [1] https://rsync.samba.org/



What does dirvish do better or simpler than rsync?


It permits you to config more complicated backups more easily. You can inherit and override rules, which is handy if you need to do for example hundreds of similar style backups, with little exceptions. The same with include/exclude patterns, quickly gets complicated with just rsync.

It generates indices for its backups that allow you to search for files over all snapshots taken (which gives you an overview of which snapshots contain some file for you to retrieve/inspect). See dirvish-locate.

Does expiration of snapshots, given your retention strategy (encoded in rules, see dirvish.conf and dirvish-expire).

It consistently creates long rsync commandlines you would otherwise need to do by hand.

In the end you get one directory per snapshot, giving a complete view over what got backed up. Unchanged files are hard-linked thus limiting backup storage consumption. Changed files are stored. But each snapshot has the whole backed up structure in it so you could rsync it back at restore time (or pick selectively individual files if needed). Hence the "virtual".

Furthermore: backup reporting (summary files) which you could be piped into an E-mail or turned into a webpage, good and simple documentation, pre/post scripts (this turns out to be really useful to do DB dumps before taking a backup etc.)

You'll still need to take care of all other aspects of designing your backup storage (SAS controllers/backplanes/cabling, disks, RAID, LVM2, XFS, ...) and networking (10 GbE, switching, routing if needed, ...) if you need that (works too for only local though). Used this successfully in animation film development as an example, where it backed up hundreds of machines and centralized storage for a renderfarm, about 2 PBytes worth (with Coraid and SuperMicro hardware). Rsync traversing the filesystem to find out changes could be challenging at times with enormous FS (even based on only the metadata), but for that we created other backup jobs that where fed with specific file-lists generated by the renderfarm processes, thus skipping the search for changes...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: