I did data backups badly for a long time, and it ended up biting me. In this article, I share my experience of doing it wrong, and some things you should consider when setting up your own backups.
The misery of backups done wrong
I want to tell a story of something that happened to me a few years ago. My house was broken into, and my computer was stolen. Nobody was home at the time, so my details are a bit lacking, but it looks like what happened is they broke in noisily through the front door, made a single pass through the house grabbing things of value semi-randomly, and ran out again.
This taught me a valuable lesson about backups. You see, I was doing regular backups at the time. I was using a program which made it easy to schedule backups between computers. My data was being backed up to Nina's laptop. Her data was being backed up to my desktop. Both machines tended to live within a few meters of each other, so both were stolen at the same time. The end payoff of doing those backups was zero.
Backups are often advised, and sensibly so, because losing data if something goes wrong sucks. Just adding backups however, without stopping to think about what risks might cause your data to be lost, may result in systems like the one I had. In other words, systems which are not helpful.
Risk Management Basics
Generally, for risk management, you need to do a few things.
The first is to identify what your risks are. This includes knowing what their impact will be, and how likely they are to happen.
The second is to figure out ways of handling these risks. Since I'm writing about backups, we'll skip over the part where you might be able to prevent the risks and go straight into making sure you're prepared to handle the aftermath.
Finally, you need to weigh these two up. Not all precautions are worth your time and money to implement. You could prevent your cat pictures from being lost in the case of an asteroid destroying the Earth by launching a satellite with your backups on them, but that would probably be too expensive to be worth it.
What are your Risks?
So, what are some examples of risks you might want to think about? Broadly, we're talking about anything that might cause you to lose access to your data. I'm grouping these based on patterns of data loss, rather than other natural groupings.
1. Normal wear and tear
Hard drives get old and break. Chances are that all of the hard drives in your office won't break at the same time, but they will all break eventually. This is fairly localized in terms of the data at risk. As long as your backups aren't on the same hard drive, they're probably fine.
2. Theft, natural disasters, and other destructive forces
Sometimes, a flood, fire, or lightning bolt will come to pass and completely wreck all of your electronics in one physical location. Thankfully this is less common than normal wear and tear.
I'm classing my experience of theft here, since the pattern of data loss is the same. If your backups live in the same physical space as the data you're backing up, there's a risk of losing both to one event.
3. Malicious programs
Security is hard, and odds are high that everyone will encounter a malicious program at some point. Maybe it's a virus that wipes out your data. Maybe it's ransomware that holds your data hostage. Regardless, sometimes your data and everything else your computer currently has write access to will be unexpectedly destroyed.
If you're careful with how you set up your backups, you could have a system that only lets your computer add files and not update or remove existing files from the backups.
4. Honest mistakes
Sometimes, after working too many hours, you accidentally type a
command wrong and destroy the data yourself. Maybe you thought they
were different files and safe to delete. Maybe you mistyped rm and got
/ instead of
. (the two keys are so close). If you use a
computer all day, you will inevitably do something dumb with it.
None of your backups can ever be useful and be 100% safe from you making a mistake. You can, however, make sure that your backups are running on an automated schedule, and push to somewhere that you'd have to go explicitly out of your way to be able to destroy.
What are some Common Backup Strategies?
I've mentioned some of the ways that things go wrong. Now what can you do about it? Here are some strategies that people use for backups.
1. Cloud backups
There are a few services where you pay a monthly fee, install their application, and the application will automatically backup your system to their servers. This is actually a very good option, if your internet connection is good enough, since they handle most of the cases listed above.
Convenience is not without its own risks unfortunately. The biggest detractor for me with cloud backups is privacy. I have documents with personal information in them that I don't want to be accidentally leaked from the backup server. These I either encrypt, or I just don't put them on the cloud in the first place.
I personally use Backblaze on Nina's laptop to make sure that all of her many photos and artistic works in progress are safe. As a nice point, we can schedule the backups to run in the middle of the night since that makes the LTE internet usage a bit cheaper.
If you decide to try Backblaze, you can sign up using this referral link to get a month free trial.
The big downside for Backblaze is that it doesn't have good Linux support. Fortunately for me, most of the files I want to backup on my Linux box fit well into my second option, using version control.
2. Version control
The majority of my work is software-related. I think it's fair to say that I have a lot of source code that I would prefer not to lose. I already use Git for version control on all of these projects, and I even have a Dropbox-like syncing system I build with Git in place. I push these repositories to a remote in the cloud (usually GitHub for public projects and BitBucket for private projects). This gives me all of the benefits of cloud backups, while fitting into a system that I'm already using.
3. Portable storage
A less convenient solution is to write your backups to a DVD, external hard drive, flash drive, or any other form of portable storage and keep it somewhere safe. This can be a good option if you want to save a lot of data with a bad internet connection. It's also a good option for private data, that you don't want on someone else's servers.
Using version control can still be a good option here, since it will let you roll back to previous versions. Just be aware that this type of backup will often either be a more manual process, or will be more vulnerable to being in the same place as the originals.
4. Print it out
I didn't consider this until I started reading up on how people secure the secret keys for their Bitcoin wallets. If the amount of data you want to backup is small, and very sensitive, print it out onto paper. Some types of data can also lend themselves to printing a copy, like sheet music. For binary data, a single QR code can store almost 3KB of data. You could split your file into 3KB chunks and print it as a series of QR codes.
5. RAID, ZFS, and Error Correcting Codes
Finally, if you're only looking for a way to protect from for the case of hardware failure, there are solutions that involve having multiple drives at once.
There are many ways of doing this, on different levels between hardware and software, but the overarching idea is to have a system that introduces redundancy to your data and spread it out over multiple physical hard drives. This allows your system to watch for errors, and correct them. Your data may take up twice as much disk space, but if either of the drives fails you can swap it out for another one without losing any data.
Some Extra Thoughts
There are a few cross cutting concerns when it comes to backup strategies that I'd like to mention. These points apply no matter how your backup is stored.
1. Use free / open source software
By free software, I mean free as in freedom, not necessarily free in terms of cost. Depending on what you need your backup to do, you may only be revisiting the files that you're storing years later. You don't want to lose access to your backups because they're in a special zip format only used by that one company that doesn't exist anymore.
Using free software will typically mean that you can bundle a copy of the software with the backups if you think it might be hard to find later. Back when I was using Truecrypt on my backups, I kept a copy of its executable with the backups. When Truecrypt was discontinued, I still had access to my backups.
2. Make sure it's automated and check that it's running
One of the big problems with backups is forgetting to do them. Make the backups run on a schedule, have it happen in the background, and forget about it.
Unfortunately, if the backup starts failing, you need to know about it so that you can fix the problem. For example, Backblaze will send me an email if they don't get the backup. This can also be as simple as a reminder once a month to go do a spot check on your backups, that they are actually where they should be and can be recovered.
3. The more valuable the data, the more copies you should have
Backups are as fallible as the original that you're backing up. We're relying on it being unlikely for the original and the backup to fail at the same time.
If data is important, you can make it more unlikely that it will be lost by having multiple backups. Only one needs to survive for you to recover your data when things go wrong.
4. Consider privacy and encryption
If the data you're backing up is something that you don't want other people to have, then you need to be careful about how you back it up. Unfortunately, making copies of your data and spreading them out also increases the chance of somebody else managing to get it.
When I want to keep something secure, I keep it encrypted. Gnu Privacy Guard (or GPG) is a free, open source, cross platform encryption program that can take some time to learn to use, but is fairly straightforward once you understand it. Just be careful not to lose access to your decryption key, otherwise the backups may become useless.
Do run backups, but don't forget to think about why
If you don't consider the risks to your data, you won't be able to put backups in place that protect you from the risks that you actually have.
Learn from my mistake. Don't just backup for the sake of the backup. Backup to manage your risks.