Bacula Restore Testing

Originally this was going to contain a brief Bacula, 6 months on section at the start. Of course, that became much too detailed, so I split them up, however I would encourage you to read it.

Restore Testing

Backup is the most obvious part of doing backups. Almost everyone's aware they need to perform them and so most businesses and some individuals make sure that they have one or more backup systems and destinations.

However, it's surprising how infrequently real restore tests get performed. Frequently this is because of the difficulty of shutting down a production system for verification or because the commands are difficult, or even that it's difficult to find the available disk space necessary to do a complete restore.

With that said, restore testing is absolutely crucial and becomes even more so as your backup systems become more complex. However, it also gets quite a bit more difficult the more complex your systems becomes, and as such hasn't always been something that I've done well.

Fortunately, one of the side-effects of moving to Bacula was that we needed to perform backup and restore tests when we were considering the solution and that has resulted in some solid steps for performing restore tests.

Verify vs Restore

Frequently, you'll see "Verify" options along with backups and often even default read-after-write operations which can validate that the data written is what was expected. Those options, especially the latter, were absolutely essential in the days of tape backups, where you'd occasionally have a tape go bad during the writing process. However, these days, the need for verify-after-write is not nearly as strong.

Similarly, you'll see "Verify" operations which will attest to the fact that the data that is stored on your backup "volumes" is what is currently on your machine. These are sometimes used as a substitute for actual restore testing, as they simulate the restore process. In effect, they exercise the catalog, validate the contents of the backup (sometimes) and compare to the bytes in the file on disk (also sometimes). These are good to a point, but really no substitute for doing what you're going to do in an emergency.

Performing actual restores on a regular basis not only exercises all mechanisms of the storage and restore process, but they also tend to lead to automating your disaster recovery scenarios and improving familiarity with the details of the restore process.

Near In-place restore

Because of the way that Bacula is structured, a client must exist in order to be the target of a restore. In addition, it must contain the correct encryption keys. As such, the most straightforward restore test is to restore the contents of your volumes to another location on the existing client. As long as you have enough storage space, this is a pretty non-invasive restore and can be done by using the standard restore commands and adding the where= argument (or adjusting the restore parameter's where value). After running the restore, use your favorite diffing utility to determine if everything is as it should be and you've got a basic restore verification.

Restoring to staging

A more complex scenario involves restoring to a staging device. With our current configuration of explicitly separate staging and production environments, this can be a little tricky, but it has the advantage of allowing you to do an in-situ replacement and validate the full restoration process.

Restoring to a "similar" production device would serve nearly the same purpose, but in our case, the staging systems are designed to be safe copies of the production environment, and there's very little "stream crossing" to be done.

With that said, if your staging and production environments are separate, you'll need to do the following in order to restore on a staging system:

  1. Register your staging system with your production bacula director by adding a client stanza for it in the bacula-dir.conf and reload the config in bconsole
  2. Place the encryption key (public and private halves) on your staging server, so the data will decrypt appropriately
  3. Add the production director to your bacula-fd.conf on the staging client, so that it can request the restoration
  4. Restart the File Daemon on the staging client
  5. Use status client to make sure that you can reach the client
  6. (Belt and suspenders) I like to also disable the File Daemon on the original client to guarantee that you don't fat finger something and make yourself very unhappy
  7. Run the restore command in bconsole selecting the appropriate files, setting the client to your staging server, and setting the restoration path to / so that you write to the original location
  8. Once complete, remove the staging server from your production director's bacula-dir.conf (to avoid confusion or accidental backups in the future), and reload in bconsole. You may also want/need to delete the client in bconsole, which will make sure it's not available.
  9. Replace the configuration on the staging device as necessary