Recovering longhorn backups


Another chapter in my learning kubernetes the hard way, this time Longhorn.

Probably ill-advisedly, I'm using ephemeral volumes for my storage volumes in Longhorn and have a habit of leaving the nodes in the cluster as they're being rebuilt. Generally, this isn't a problem. This weekend, I was a bit too cavalier about handling the rebuild process and didn't prune the temporary volumes each time I replaced a storage node, resulting in all of my storage getting nuked.

In my case, even the "persistent" volumes are all just cache, so it didn't really matter. However, since I'm also backing this up to an S3-compatible storage system, it gave me an opportunity to try retrieval.

Ephemeral storage in persistent nodes

If I were to remove nodes completely and bring them up afresh, I wouldn't have these problems. However, I've had the practice of draining/cordoning nodes and then rebuilding them and re-establishing them in the cluster without removing them completely from the cluster.

This is marginally faster, but results in the system thinking it was just temporarily disconnected instead of completely gone. Because of this, the longhorn storage expects the volumes to be there and present. The current failure mode is for them to indicate an error, but not allocate new space. This makes sense in terms of restoring from backups; but it's not helpful in my case, since it takes up the volume slot and prevents the other nodes from rebuilding.

This weekend, I rebuilt all 4 of my storage nodes, resulting in a complete loss of data. Configuration was, of course, fine, since that's in the etcd (which I didn't screw up this time).

Restoring the pvc

As an experiment, I wanted to try restoring the backups from my S3-like backup storage to see if it would work. This is the process that worked for me:

  1. Quiesce the dependent pod by scaling down to zero:

    kubectl scale deploy --replicas=0 renovate-whitesource-renovate
    
  2. Use the GUI to restore the backup to a new volume (named appropriately). In my case, I named it mend-restored

  3. Wait for the restore to finish

  4. Delete the old Volume (GUI)

  5. Create PV/PVC on the backup (GUI). Use the existing name for the PVC.

  6. Once the PV and PVC are available, scale the dependent pod back up:

    kubectl scale deploy --replicas=1 renovate-whitesource-renovate