Server Backups using LXD

So I am working the process of server backups today.  Most people do backups wrong, and I have been guilty of that too.  You know it’s true when you accidentally delete a file, and you think ‘No worries, I’ll restore it from a backup…’; and about an hour later of opening archives and trying to extract the one file but finding some issue or other…makes you realize your backup strategy sucks.  I am thus trying to do get this right from the get-go today:
LXD makes the process easy (albeit with a few quirks).  EXPLOINSIGHTS Inc. (EI) servers are structured such that each service is running in an LXD container.  Today, there are several active, ‘production’ servers (plus several developmental servers, which are ignored in this posting):

  • Nextcloud – cloud file storage;
  • WordPress – this web-site/blog;
  • Onlyoffice – an ‘OnlyOffice’ document server;
  • Haproxy – the front-end server that routes traffic across the LAN

All of these services are running on one physical device.  They are important to EI as customers access these servers (whether they know it or not), so they need to JUST WORK.
What can I do if the single device (‘server1’) running these services just dies?  well I have battery backup, so a power glitch won’t do it.  Check.  And the modem/router are also UPS charged, so connectivity is good.  Check.  I don’t have RAID on the device but I do have new HD’s – low risk (but not great).  Half-check there.  And if the device hardware just crashes and burns just because it can…well that’s what I want to fix today:
So my way of creating functionally useful backups is to do the following, as a simple loop in a script file:

  1. For each <container name> on server1:
    1. lxc stop <container-name>
    2. lxc copy <container-name> TO server2:<container-name##>
    3. lxc restart <container-name>
  2. Next <container-name>

The ‘##’ at the end of the lxc copy command is the week-number, so I can create weekly container backups EASILY and store them on server2.  I had hoped to do this without stopping the containers, but the criu LXD add-on program (which is supposed to provide that very capability) is not performing properly on server2, so I have a brief server-outage when I run this script for each service for now.  I thus have to try to run this at “quite times”, if such a thing exists; but I can live with that for now.
I did a dry-run today: I executed the script, then I stopped two of the production containers.  I then launched the backup containers with the command:

  • lxc start <container-name##>

I then edited the LAN addresses for these services and I was operational again IN MINUTES.  The only user-experience change I noticed was my login credentials expired, but other than that it was exactly the same experience “as a user”.  Just awesome!
Such a strategy is of no use if you need 100% up-time, but this works for EI for now until I develop something better.  And to be clear, this solution is still far from perfect so it’s always going to be a work in progress:-
Residual risks include:

  1. Both servers are on same premises, so e.g. fire or theft risks are not covered;
    1. Really hard to fix this because of data residency and control requirements.
  2. This strategy requires human intervention to launch the backup servers, so there could be considerable downtime.  Starting a backup lxd container for the haproxy server will also require changes at the router (this one container receives and routes all http and https traffic except ssh/vpn connections.  The LAN router presently sends everything to this server.  A backup container will have a different LAN IP address thus router reconfig is needed);
  3. The cloud file storage container is not small – about 13GB today.  52 weeks of those will have a notable impact on storage at server2 (but storage is cheap);
  4. I still have to routinely check that the backup containers actually WORK (so practice drills are needed);
  5. I have to manually add new production containers to my script – easy to forget;
  6. I don’t like scheduled downtime for the servers…

But overall, today, I am satisfied with this approach.  The backup script will be placed in a cron file for auto-execution weekly.  I may make my script a bit more friendly by sending log files and/or email notification etc., but for now a manual check-up on backup status will suffice.