Bacula pruning old storage


I note with some amusement the fact that I wrote on this exact day last year about this same subject (in much more detail).

The reason for the new message on this subject is that I'm still cleaning up some of the decisions I made when first using Bacula.

The Problem

Before I realized that I really needed to have three pools to make things work correctly, I was storing the full, differential, and incremental backups in the same pool. This turned out to be a bad decision, and I rectified it prior to my last note on the topic— however, not until I'd accumulated a lot of "volumes" of data that were still in my main pool (the one I continue to use for my full backups).

If you look at volumes the way that Bacula does, they're basically tapes. As such, they're considered not to take up any more space when they're full as when they're empty. This makes sense for tapes, since the tendency in a tape-based system is to rotate media physically through offsite and onsite storage and then to tape libraries or robots for reuse. For on-disk volumes, though, this poses a different problem—empty on-disk volumes and full ones do not take up the same amount of space.

In my case, I was running low on disk space on my storage volume and when I looked I noticed many volumes that had not been written to in quite some time and were marked as "expired". These volumes would eventually be reused, but since that pool had run for quite some time with full, differential, and incremental backups in it, I had a large number of "tapes" that contained expired data to write over. As the full backups take a relatively constant number of volumes, they would take a few more years to overwrite the volumes used for differential and incremental backups.

In short, I had a lot of used volumes taking up space (both in the catalog and in storage) that should have been purged.

The Solution

In order to stop bacula from continuing to allow the "old" storage to lie around, I needed to delete the volumes. This makes sense if you think like tape—once you've bought the tape, you might as well leave the data on it (ignoring the security concerns) until you need to reuse it. But, this isn't tape, and there's a significant benefit to keeping the number of volumes right-sized for our environment.

In my case, I wanted to remove all volumes from my Cloud-CT media pool that were more than 2 years old. In this case, the bacula sql command came in handy as I was able to directly query the database:

select 'prune yes volume=' || volumename from media where lastwritten < '2020-10-03' and mediatype='Cloud-CT' order by volumename;

The above query resulted in a number of lines that could be pasted directly into the bconsole in order to purge the volumes.

Based on some examples, and out of an abundance of caution, I decided to purge the volumes before deleting them. This was likely an unnecessary step, but it ensured that my catalog database was cleaned as well.

Once done, I was able to rerun the sql command, replacing prune with delete to delete the unnecessary volumes.

This cleared up all the near-side volumes, removing the storage that they consumed as well as their markers so that they would not be reused in the future.

For the far-side (cloud) copies, I opted to directly purge those using a find command:

find bacula-west/ -mtime +720 -exec rm -r \{\} \;

Where bacula-west is the name of the storage location.

Summary

All told, if I'd known originally what a mess I was creating by using a single pool, I would have resolved that earlier, but this is how we learn.