5 Common Backup Failures and How to Avoid Them

World Backup Day (Mar 31) reminds us to double-check existing backups, as even a working script doesn't guarantee the data's availability.

5 Common Backup Failures and How to Avoid Them

World Backup Day is coming up on Mar 31 and this year I wanted to focus on existing backups, rather than new ones. The reason is that even a working backup script doesn't guarantee the data is there when it's needed. Backups can fail in subtle ways and World Backup Day is a good opportunity to double-check and make sure everything is in order.

This article has a tl;dr checklist on what to look out for and then more details for each item. If you are still working on setting up your backups, then start with this higher level guide on backup strategy and then check back when it's all up and running.

  • Backups are happening
  • Backups are happening often enough
  • Backups contain the expected data
  • Backups are independent (enough) from the source data
  • Backups are safe from modification

Checklist for existing Backups

1. Backups are happening

This is the most common and easiest to spot. You want to have a way to be sure your backup script still works after months or years of updates and server modifications. So it's essential to log successful runs, the absence of successful runs, and failures. And keep in mind that the absence of failures isn't the same as success. If your script just doesn't run, it can't fail.

There are a few services that are specialized in monitoring cron jobs. They will also alert you about the absence of successful backups, not just failures. Borgmatic, a very useful wrapper around Borg implements a number of them, as described here. Or you can ping a service endpoint at the end a run like this:

1 1 * * 0  backup.sh && curl http://my-monitor.com/success

For Borgmatic you can define error hooks that are called when failures happen:

hooks:
	on_error:
		- send-message.sh "Error with {repository}"

And log successful runs:

hooks:
	healthchecks:
		ping_url: http://my-monitor.com/success

When using BorgBase, we will also monitor for the absence of successful backups for you and alert you when they are absent for too long.

2. Backups are happening often enough

This is less a technical problem, and more an organizational one. It's related to your recovery point objective (RPO), as described here. In a nutshell, RPO is a fancy way of saying that a backup becomes less useful, as it becomes older.

Consider, for example, an office file server that's used by dozens of employees. If you do daily backups in such a case, you are at risk of losing a whole day's work (if the incident happens just after the end of the workday). This may not be acceptable. So you should always match your backup technique and interval to the speed at which data changes.

In the case of our office file server, you could add an additional backup run at noon (if bandwidth is an issue) or set up hourly local snapshots to cover the most common risks of accidental deletions.

3. Backups contain the expected data

This bit me just a few weeks ago. Having a backup task that runs without error doesn't necessarily mean the data is there. In my case, the --one-file-system option excluded most folders from the backup after the file system setup of a server changed due to an update. When I tried to recover a file, I just had empty folders. Luckily there were local snapshots in this case.

This error is hard to catch with automated tooling. So the best way is to browse or mount the backup and look at a few files and folders. You can use Vorta, the desktop backup client BorgBase maintains to browse the actual files. Or mount an archive and browse it locally. For BorgBase you can also track the change in space usage over time on a chart. Though this may miss issues with small folders or data that doesn't change much.

Another way to really verify a backup, is to use it while migrating to a new server.

4. Backups are independent (enough) from the source data

Backups are there to protect from uncertain events that may happen. If one event can take down your backup and the source data at the same time, it's not a good backup. Some examples:

  • Your backup for your laptop is on a USB drive in your house. The house burns down, destroying the laptop and USB drive.
  • Your backup is with a cloud provider that's managed and accessed by the same admin account managing the source data. So anyone breaking into this account could remove the source data and the backup.
  • A hosting company is generous enough to include some free backup space with your hosting package. This is great but risks losing all your data if this one company goes out of business unexpectedly.

Of course, it will be difficult to fully uncorrelate your data from the backup. Hence the "enough" in the title. It will still be on the same planet and maybe the same country. But there are many ways to improve the situation by putting your eggs in different baskets, i.e. providers, regions, drives of the same company, etc.

5. Backups are safe from modification

This final point applies to all "push" style backups where the source machine also initiates the backup. Popular sync solutions, like Dropbox, fall into this category: A file is changed locally and Dropbox uploads the file, potentially overwriting the remote backup copy. Another example would be simple (S)FTP storage space, where either only one copy of the data is kept or the client has permission to modify the data.

Here are some simple measures you may be able to implement with your existing solution:

  • When using sync solutions as backup (please don't), enable time-based versioning, so there is always a good copy remaining of each file. Keeping a fixed number of versions will work less well.
  • When uploading via (S)FTP, do some kind of snapshots on the remote end that are out of the client's control.
  • For object storage, like S3, you can sometimes restrict DELETE permissions or also enable versioning.
  • BorgBase users can simply enable append-only mode, which will preserve all data until the repository is explicitly cleaned up.
👋
Like what you see? Consider subscribing to the Noted newsletter! You can always unsubscribe at any time. We also have Discord!

Conclusion


Now you know many ways in which a backup can go wrong and hopefully more ways to avoid ever getting in a bad backup situation. For this year's World Backup Day, just pick up this list and go over it for each system and backup you manage. It will ensure your backup is there when you need it.

If your setup could use another remote copy, also check out BorgBase.com. We have made managing backups simple by taking care of many potential issues and providing great copy & paste assistants and exceptional support when you need it. We also sponsor or maintain many projects in the space, including Borg, Borgmatic , Pika Backup and Vorta.

🎁 For new users there is also a coupon for 30% off on the first invoice. Just use code WBDX23 during checkout.