15 votes

Shopping around for a new-and-improved backup solution

A few days ago, I posted this and quickly realized that the world of data backups is far richer than just sudo rsync -av --delete --exclude=Videos /home /home_bkup.

So now I'm window shopping the top Linux-supported backup solutions: borg, duplicacy, kopia, restic and--oh look--a core borg dev just dropped his own new-and-improved solution, vykar.

Restic was the first tool I started to research, and I thought I really liked it, got as far as installing, initializing a test repo, creating a couple of snapshots. But restic seems to be, hmm, fussy about the source and destination paths, absolute vs relative paths, etc.

The fact that merely renaming a parent directory (or grandparent, or great-grandparent, etc) causes restic to treat every unchanged byte below that as brand new ... that's a recipe for giant, bloated repos, and it's unacceptable to me ... and hey, lookit that, borg does not do that. So now, restic is out and borg is in.

But what other pros v cons are there, that I haven't even realized need to be considered? What advantages/disadvantages do other apps offer? Which ones can I easily automate with nightly/hourly cron jobs? Which ones have their own even-better automated solutions?

Do I even want encryption? All of my drives/volumes are LUKS encrypted, and anything I would store remotely would also get encrypted before it ever left my LAN ... plus, I'm just a bit nervous about having the backups encrypted, requiring working, functional software to restore/recover data from them....

That may not seem like such a big concern, perhaps, but I am currently working my way thru decrypting a bunch of 10-15 year old TrueCrypt-ed volumes, which requires using an old, outdated version of VeraCrypt and a somewhat "cross-my-fingers" effort to find KeePass repos old enough (also outdated, KeePass 1.0 repos) to still contain the various passwords I used to encrypt those ancient volumes ... but also still use new enough master passwords that I can still get the KeePass repos unlocked.

With rsync, I can literally just go into any backup, find the specific version of the specific file(s) I want to recover, and manually copy it back to my workspace. Is anything like that option available in any of these deduplicated/encrypted solutions, even if they're not encrypted? If (eg) a borg repo is created w/o encryption, the data is still all just borg-specific blobs, right? Or can I navigate into the repo and just manually grab files?

Oh yeah ... for reference, the past 10-ish years, my backup routine has been to create a new, dated, destination folder, starting with a full backup of my /home folder (excluding things like Videos, Music, VMs, other bulky stuff that gets backed up separately/differently), and then running nightly diff backups into the same folder, while also maintaining a "one-day-older" second backup of the whole thing on a 2nd HDD ... then, every 3-6 months, zipping up the current backup folder and starting a new one.

At any rate, there you go; that's the kind of stuff I'm thinking about now, as I overhaul my 20-year-old, 20TB (but could be 2TB) backup system.

Any and all feedback, recommendations, tips are welcome. Danke.

13 comments

  1. [2]
    drmcbludgeon
    Link
    I'll preface this by saying I may have done something wrong, but I originally used Borg to backup my NAS to an external drive. NAS mostly holds backups of my BR disks and backup images from the...

    I'll preface this by saying I may have done something wrong, but I originally used Borg to backup my NAS to an external drive. NAS mostly holds backups of my BR disks and backup images from the family PCs. It was fine until the repo approached the size of the disk. Pruning wasn't enough to trim the repo size down, so I was stuck in a loop where the backups could no longer complete, and I could not trim the repo size down further. These attempts at pruning would sometimes take multiple days. There were a couple of other "gotchas" that affected this workflow that I can't fully recall at the moment, but I ultimately had to nuke the backup after a few years and start over. I went with Backrest which is a webui for Restic. I have had no similar problems. The repo trims itself as expected and seems to work quickly. I tested file recovery and I was able to retrieve the files without drama.

    Keep in mind I am using this to backup a NAS - my root file structures are not changing often, if ever. New directories will occasionally be added to the root, but I am not going to the root level and moving things around.

    3 votes
    1. thorondir
      Link Parent
      For people running into similar issues: You probably need to borg compact your destination. A prune doesn't actually free any of the blocks on disk, unless no backup references them anymore. I...

      For people running into similar issues:
      You probably need to borg compact your destination. A prune doesn't actually free any of the blocks on disk, unless no backup references them anymore. I regularly reclaim hundreds of GB on a repository that's not even 1 TB. [because I do hourly backups of my database, which is why there's a lot of churn]

  2. creesch
    Link
    I am using restic through backrest which just makes it slightly more convenient with a visual overview. It all goes to a hetzner storage box. I can choose what files/folders specifically to...

    I am using restic through backrest which just makes it slightly more convenient with a visual overview. It all goes to a hetzner storage box.

    Or can I navigate into the repo and just manually grab files?

    I can choose what files/folders specifically to restore/download as well from each snapshot.

    As far as the parent directory changing goes. I guess that might be an issue, but I don't do that very often. Certainly not for directories that contain large files. Movies and series are sitting on my NAS and aren't backed up, development stuff sits in git repos and doesn't need to be included. So my restic backups are fairly consistent in size.

    3 votes
  3. F13
    Link
    I used rclone for a while to back up to cloud providers, but I am currently between solutions. Rclone might not tick all your boxes but I was pretty happy with the experience. The only reason I'm...

    I used rclone for a while to back up to cloud providers, but I am currently between solutions. Rclone might not tick all your boxes but I was pretty happy with the experience. The only reason I'm not using it now is because I was relying on unlimited cloud storage and I haven't yet decided on a solution that doesn't.

    2 votes
  4. ShroudedScribe
    Link
    I've been using Plakar for a while and am pretty happy with it. I have backups run at different frequencies based on the needs of each set of data, and then back up the most frequent snapshots to...

    I've been using Plakar for a while and am pretty happy with it. I have backups run at different frequencies based on the needs of each set of data, and then back up the most frequent snapshots to a cloud S3 provider on a less frequent basis.

    It took me a while to figure out how to set it up the way I wanted, but it's been chugging along in the background now with no issues.

    2 votes
  5. goose
    Link
    I've been on the free version of Veeam for over a decade, and it has been fantastic the few times I've needed it, both for individual file restoration, and volume level restoration. My only...

    I've been on the free version of Veeam for over a decade, and it has been fantastic the few times I've needed it, both for individual file restoration, and volume level restoration. My only complaint is that I have to use pre/post backup scripts to back up remote machines over sshfs, as the native options are a local filesystem, samba share, or a VB&R server (not free).

    2 votes
  6. [2]
    mxuribe
    Link
    Hi @Eric_the_Cerise So, first of all, for a number of years, I used to just manually copy/paste my home directory over to a samba destination drive/share on my local LAN...and then also copy that...

    Hi @Eric_the_Cerise
    So, first of all, for a number of years, I used to just manually copy/paste my home directory over to a samba destination drive/share on my local LAN...and then also copy that to one of those external USB drives....Slow as hell, and cumbersome, but if all things failed except for the USB drive, then it would be easy - though slow - to recuperate things. Its a pretty inefficient method, but pretty good...well, it depends how much data, speed of the USB ports, the LAN/network, etc. But, it was solid, did not require special apps/clients, and as noted quite direct to restore things.

    Then, I started using rsync (and very, very similar parameters to the command you referenced above, though i exclude other stiff not videos)...After an initial copy/paste of a directory, then I just keep using rsync at a regular interval, and its worked GREAT! The delta approach that rsync uses (only sending changed stuff) is so powerful! Yes, other apps, clients also use delta transfer, but nothing so non-specialized as rsync. Are there better backup systems? I'm sure there could be. But, you should consider, what is it that you are trying to improve, what are you hoping to achieve by reviewing other approaches, apps, clients, etc.? Its possible what you seek can be better implemented by something other than rsync...but when i reviewed my needs, rsync met them way better than anything else that i have tried...Everyone's mileage will vary. :-)

    2 votes
    1. Eric_the_Cerise
      Link Parent
      I want "git-like" functionality. I don't just want to be able to restore the latest uncorrupted copy of my data, or to maintain "last 3 daily, last 3 weekly, last 3 monthly" snapshots, etc. I want...

      I want "git-like" functionality. I don't just want to be able to restore the latest uncorrupted copy of my data, or to maintain "last 3 daily, last 3 weekly, last 3 monthly" snapshots, etc.

      I want to be able to reliably roll back to any particular date/snapshot/version of a file, document or project. I want to be able to pull up a version from 2 years and 8 days ago, and diff it against my current copy. Things like that.

      I don't necessarily need this level of rollback specificity with my entire /home partition -- in fact, I definitely don't. But I do want this capability with a lot of my data, such as my Calibre libraries, almost everything in my Documents directory, etc.

      So I'm looking for a backup solution that can do that, for specific projects, directories, etc, and then I'll probably just continue using rsync--pretty much just the way I've been using it--for the left-overs.

  7. 2c13b71452
    Link
    I am using restic but I evaluated borg too. I can't remember now why I went with restic, it may have been that restic is written in go and borg is python. Python does not feel right for something...

    I am using restic but I evaluated borg too. I can't remember now why I went with restic, it may have been that restic is written in go and borg is python. Python does not feel right for something like backups. Restic is a bit fussy to set up though, I have a little script that mounts the archive and that makes it not too bad.

    Have you looked at rsnapshot? It is written in Perl so perhaps from that point of view that's worse than Python, but it just works. For one of my backups I use rsnapshot to take a daily, weekly and monthly snapshot. It uses filesystem hard links so if you have a lot of immutable files the snapshots don't take up extra disk space. And because it is just using the filesystem it means all your files are just there to access, no software needed for recovery.

    1 vote
  8. FireTime
    Link
    One to possibly add to your list is duplicati. Cant speak to all of the pros and cons but I was using it for a long time for off-site backup to a small low power NAS that only supported FTP. The...

    One to possibly add to your list is duplicati. Cant speak to all of the pros and cons but I was using it for a long time for off-site backup to a small low power NAS that only supported FTP. The software also supports a whole bunch of other data transfer methods but fully runs as a client only application. Deduplication is managed at a block level so it should manage your folder change example gracefully. It will see the move as a new large file but will reuse the old blocks from the previous location meaning it will not create new ones.

    I setup the remote NAS to connect to my network as a VPN client and act as a ftp server. Weekly I would pause all of my docker applications and copy app data to a staging folder. Duplicati would then process the app data as well as normal file data for changes and upload. It's not super fast but neither is the remote NAS.

    I was happy with how simple it was to configure and run it in a docker container. If you already have a docker setup I would recommend pulling down an image and poking with it as it was fairly simple to get going quickly.

    1 vote
  9. ap0r
    Link
    I would love to add to this discussion but I just have calendar reminders to copy my "server" folder over to my "backups" HDD manually every week and upload a copy of everything in "server" except...

    I would love to add to this discussion but I just have calendar reminders to copy my "server" folder over to my "backups" HDD manually every week and upload a copy of everything in "server" except for music and movies to my cloud account. So I will read through the replies and try to further my education.

    1 vote
  10. Pistos
    Link
    duplicity has been largely set and forget for me. I have it writing to an external HD which I have in mind to disconnect and take in the event of an emergency. Automatic full vs. partial backups,...

    duplicity has been largely set and forget for me. I have it writing to an external HD which I have in mind to disconnect and take in the event of an emergency. Automatic full vs. partial backups, and rotation/drop of old backups. I'm sure it could be set up to be part of a remote backup setup if one wanted.

    1 vote
  11. Macil
    Link
    I believe that Restic stores all the files in a content-addressed manner where the contents will be deduplicated even in this case. When it encounters file paths it doesn't recognize or files with...

    The fact that merely renaming a parent directory (or grandparent, or great-grandparent, etc) causes restic to treat every unchanged byte below that as brand new ... that's a recipe for giant, bloated repos, and it's unacceptable to me

    I believe that Restic stores all the files in a content-addressed manner where the contents will be deduplicated even in this case. When it encounters file paths it doesn't recognize or files with an updated last-modified timestamp, it has to re-read them but it will end up recognizing that their contents already exist in the backup if they're unchanged.

    I wouldn't worry too much about the possible redundancy of your backup's encryption when using a tool like Restic. If your backups are on a filesystem that's already encrypted and you're worried about the risk of losing the Restic key, you could always save that key in plaintext right next to the backup. And if you do ever move/copy your backups off that encrypted filesystem onto an unencrypted system or a cloud host, it will be convenient to be able to move/copy the backup files as-is instead of needing to layer in another program to manage encryption.