We've heard it all before: AWS is expensive, and you need to watch out for the hidden sharp edges of their pricing model. Today I provide a small lesson in that concept.
ClueTrust has run through a number of backup methodologies over the year, originally using Retrospect (when they were their original, independent, selves) to tape, then moving to BRU to handle more multi-platform capabilities, eventually deprecating tape and mirroring to an off-site storage system, and most recently, our move to Bacula.
BRU didn't have a glacier module, so I wrote (and re-wrote) a series of python scripts that handled backing up, storing metadata (because glacier doesn't allow you to choose the names of your storage units) and purging older archives when appropriate.
Melting the glacier
As part of our work this year, we've been looking at various storage models for our new datastore. Since Bacula is capable of supporting S3, we looked at storing off-site data using S3-compatible servers in a couple of locations. On the open-source side, this is powered by minio, but we also considered using the new Backblaze S3 Compatible API.
Either way, it was clear that raw glacier, as we'd been doing in the past, wasn't going to make any sense for us going forward.
In the intervening years, Amazon had done a nice job of reducing the price of storage, and even retrieval (in bulk) for Glacier, and we only have about 3TB of data sitting there right now. This costs us approximately $12/month to store. Still $144/year, which at today's prices will get you just about 1.5 4TB SMR drives per year (don't get me started on SMR, especially since we use ZFS).
We're just beyond the 90-day window of deleting data from Glacier, so I took a look at what it would cost us to download and archive the data from Glacier locally and just delete the rest of it. For those of you unfamiliar with Glacier, there's a minimum 90-day retention policy; if you delete your data in <90 days, you pay for the entire 90 days for that data.
Getting you coming and going
This section title is mildly misleading: AWS doesn't charge you for upload (except transactions), but they do charge for storage ($0.004/GB/mo right now), and transfers out of Glacier to the internet ($0.09/GB).1
So, to offload my 3TB of data it would cost 0.09 * 3000 or $270 (+$7.5 for the bulk retrieval fee).
We don't have all of that data locally stored (retention policies are tricky, and glacier guarantees a certain level of redundancy), so we will slowly delete that data as it ages out of our retention policy and hope that we don't need to restore it (and pay the retrieval fee). So, glacier's got us as a customer for a few more months declining from $12/month due to the financial lock-in of the retrieval price.
Pricing as of 2020-06-26 06:49. If you're reading this more than 6 months from now, pricing has probably gone down again. ↩