Bacula pruning


After 18 months of using Bacula and sending copies of my data to the cloud (in this case, cloud I operate in another location) using an S3-compatible storage mechanism, I noticed I had a lot of data sitting around on my current server for backups. When I set out to move to Bacula, I decided to use long retention times for my core monthly full backups, which resulted in more than a small number of terrabytes of data.

At the time of the implementation (and still the case at the time of this writing), the automatic options in Bacula for pruning/truncating local copies of cloud datasets were:

  • No (do not remove cache)
  • AfterUpload (each part removed directly after upload)
  • AtEndOfJob (each part removed at the end of the job)

None of these would work for me, as I want to retain the data for months locally, only giving up my cached copy when I'm outside of my normal restore window, or when I need the space.

There are a number of ways to prune, depending on how much you want to get into the Bacula mindset.

Manual purge using find

It turns out that if you leave the label intact (the label being part.1 in the volume directory), you can delete any parts in the cloud volume and they will be auto-retrieved during a restore. This will allow you to override any settings you have in bacula-dir.conf for your CacheRetention and just manually purge in any way you like. In my case, I made use of find:

find .  -regextype posix-egrep -regex '.*\/Vol-.*\/part\.([2-9]|..+)' -exec rm \{\} \;

This particular command uses a posix regular expression to find any file in any directory starting Vol- and named part._number_ where number is any value other than 1.

Manual pruning using bconsole

Bacula's console (bconsole) has a Cloud command which can be used to force a prune operation. The cloud prune command respects the CacheRetention setting and has a number of command-line parameters to allow you to specify what you want to prune. You can prune by storage, pool, or even MediaType. There is also a parameter to prune AllPools.

In my case, I used:

cloud prune AllFromPool Storage=Cloud-CT Pool=File

which breaks down to:

  • cloud command
  • prune sub-command
  • AllFromPool: run the purge command on all volumes in the pool
  • Storage=: use the specific Storage definition (in this case Cloud-CT)
  • Pool=: use the specific Pool (in this case File)

For ClueTrust, we use 3 different pools in our storage:

  • File for the full backups (historical naming convention)
  • Inc-File for the daily incremental backups (from the last File backup)
  • Diff-File for the weekly differential backups (from the last File backup)

In this case, I only want to purge the full backups that are outside of the range of the incremental and differential backups. To that end, I've set the CacheRetention appropriately in my bacula-dir.conf file and so I can trust bacula to clear these correctly.

Automatic pruning using bacula admin jobs

I've read that this is possible, but I haven't found the appropriate documentation yet. At this point, I can't recommend, but the other two processes work fine and are easily scripted if need be.