Smart Nextcloud Backup Strategy

You know I am a fan of Nextcloud. I got so used to having this for my consulting company that even though the company itself is no long with us, the Nextcloud servers found personal utility. that shouldn’t be a surprise for anyone who has looked into Nextcloud.

The big problem for personal users (and even some small businesses) is…the backup strategy for Nextcloud.

You can certainly backup your user data files easily enough – just take a snapshot copy of user files and copy it to e.g. a USB drive and pop it in your safe and you are done. Update it weekly or whatever, and you have a reasonable backup set for your user files. Not terriible, not perfect, but it’s fine. And certainly better than nothing.

But for Nextcloud, a users data files is just one small part of the server. There’s apps, settings, configurations; and shares and links with passwords and expiration dates, and all kinds of ‘nice’ features that, well, also ideally need to be backed up.

If you read online documets, they tell you to run a seperate mysql server via some complicated setup that’s really just not practical for home users. So it seems we’re doomed to just having our files backed up, and if the server goes pooft, we have to rebuild it from ground up – and then go back and add all the finishing touches: apps, user settings, Nextclould configuration options, links, theming etc. etc.

I have previously reported how if you run your Nextcloud server as an lxc container (really simple under Ubuntu linux) then you can copy the containers from one machine to another, and it helps with some of the configurations. But at the time of writing, you had to copy an entire server – that can quickly grow into TB+ storage drives for user data. That’s a slow copy even over a lan (and forget it for wan). And it becomes impracticable for routine (e.g. daily) backups of a server

BUT Now…we have LXD 4.0. And it has a new command option that fixes all of that:

~$: lxc copy container remote:container –refresh

the key word, or rather an option, is –refresh. If you setup a Nextcloud container called (e.g.) nextcloud on a local machine, and you have another, remote machine called (e.g.) BackupServer (can be on same lan or even halfway around the world) then this command copies your container to the other machine without having to stop the existing machine. It copies everything about your instance – it is an exact copy:

~$: lxc copy nextcloud BackupServer:nextcloud –refresh

All you need is two machies running LXD version 4.0 or later and enough HD space for your backup copy (copies). So far so good – the same as prior releases of lxd.

And if you re-run that command say every day you get your offsite copy refreshed to be an updated version of your server – and it only sync’s changes, not the whole server. So your first copy might be slow, but subsequent ones will be quite fast. And there’s no need to stop your local container to copy it (the remote copy however has to be stopped).

You can cron this, so that it makes refreshed copies say daily (or more), and then makes weekly unrefreshed backups after say 7 days, then monthly and so forth. You can have BACKUP copies of OLD servers if you want or need to go back, and you can always have your live version backup daily (or more). Note that multiple backup copies of a server image take up the same space per copy, so storage cost may become an option (but spinning-rust HD’s are cheap, so probably not amajor issue for the home enthusiast or even small business).

If your current server goes down, all you do is start the remote one, point a router to it, maybe update a DNS server (if it’s on a different public IP address) and you are back in business. Same name, links, shares, setttngs, users, data, files, confiuration, settings etc. It all just *works* – just as it should.

You can be up and running in less than five minutes from when you detect your primary server is down. That makes for pretty good uptime availability of your server.

We think this –refresh option is a game-changer for lxd.

Happy Backups!

Encrypting and auto-boot-decryption of an LXC zpool on Ubuntu with LUKS

Image result for luks key list

So we have seen some postings online that suggested you can’t encrypt an lxd zpool, such as this GitHub posting here, which correctly explains that an encrypted zpool that doesn’t mount at startup disappears WITHOUT WARNING from your lxd configuration.

It’s not the whole picture as it IS possible to so encrypt an lxd zpool with luks (the standard full disk encryption option for Linux Ubuntu) and have it work out-of-the-box at startup, but perhaps it’s not as straightforward as everyone would like


With that said…this post is for those who, for example, have a new clean system that they can always do-over if this tutorial does not work as advertised.  Ubuntu OS changes and so the instructions might not work on your particular system.

Firstly, we assume you have your ubuntu 16.04 installed on a luks encrypted drive (i.e. the standard ubuntu instal using the “encrypt HD” option).  This of course requires you to enter a password at boot-up to decrypt your system, something like:Image result for ubuntu full disk encryption

We assume you have a second drive that you want to use for your linux lxd containers.  That’s how we roll our lxd.

So, to setup an encrypted zpool, select your drive to be used (we assume it’s /dev/sdd here, and we assume it’s a newly created partition that is not yet formatted – your drive might be /dev/sda, /dev/sdb or something quite different – MAKE SURE YOU GET THAT RIGHT).

Go through the normal luks procedure to encrypt the drive:

sudo cryptsetup -y -v luksFormat /dev/sdd

Enter the password and NOTE THE WARNING – this WILL destroy the drive contents.  #YOUHAVEBEENWARNED

Then open it:

sudo cryptsetup luksOpen /dev/sd?X sd?X_crypt

Normally, you would create your normal file system now, such as an ext4, but we don’t do that.  Instead, create your zpool (we are calling ours ‘lxdzpool’ – feel free to change that to ‘tank’ or whatever pool name you prefer):

sudo zpool create -f -o ashift=12 -O normalization=formD -O atime=off -m none -R /mnt -O compression=lz4 lxdzpool  /dev/mapper/sdd_crypt

And there you have an encrypted zpool.  Add it to lxd using the standard ‘sudo lxd init’ procedure that you need to go through to create lxc containers, then start launching your containers and voila, you are using an encrypted zpool.

So, we are not done yet.  We can’t let the OS boot up without decrypting the zpool drive, lest our containers disappear and lxd goes back to using a directory for its zpool, per the GitHub posting referred to above.  That would not be good.  So how do we make sure this is auto-decrypted at boot-up (which is needed for lxc containers to launch)?

Well, we have to create a keyfile that is used to decrypt this drive after you decrypt the main OS drive (so you do still need to decrypt your PC at bootup as usual – as above):

sudo dd if=/dev/urandom of=/root/.keyfile bs=1024 count=4
sudo chmod 0400 /root/.keyfile
sudo cryptsetup luksAddKey /dev/sdd /root/.keyfile

This creates  keyfile at /root/.keyfile.  This file is used to decrypt the zpool drive.  Just answer the prompts that these commands generate (self explanatory).

Now find out your disks UUID number with:

sudo blkid

This should give you a list of your drives with various information.  We need the long string that comes after “UUID=…” for your drive, e.g.:

/dev/sdd: UUID=”971bf7bc-43f2-4ce0-85aa-9c6437240ec5″ TYPE=”crypto_LUKS”

Note we need the UUID – not the PARTUUID or anything else.  It must say “UUID=…”.

Now edit /etc/crypttab as root:

sudo nano /etc/crypttab

And add an entry like this:

#Add entry to aut-unlock the encrypted drive at boot-up,
#after the main OS drive has been unlocked
sdd_crypt UUID=971bf7bc-43f2-4ce0-85aa-9c6437240ec5 /root/.keyfile luks,discard

And now reboot.  You should see your familiar boot-up screen for decrypting your ubuntu OS.  And once you enter the correct password, the encrypted zfs zpool drive will be automatically decrypted and will allow lxd to access it as your zpool.  Here’s an excerpt from our ‘lxc info’ output AFTER a reboot.  We highlighted the most important bit for this tutorial:

$ lxc info
storage.zfs_pool_name: lxdzpool
– id_map
– id_map_base
– resource_limits
api_status: stable
api_version: “1.0”
auth: trusted
auth_methods: []
public: false
driver: lxc
driver_version: 2.0.8
kernel: Linux
kernel_architecture: x86_64
kernel_version: 4.15.0-34-generic
server: lxd
storage: zfs

Note we are using our ‘lxdzpool’.

We hope this is useful.  GOOD LUCK!

Useful additional reference materials are here (or at least, they were here when we posted this article):

Encrypting a second hard drive on Ubuntu (post-install)

Setting up ZFS on LUKS


LXC Container Migration – WORKING

So we found a spare hour at a remote location and thought we could tinker a little more with lxc live migration as part of our LXD experiments.

Related image

We executed the following in a terminal as NON-ROOT users yet again:

lxc copy Nextcloud EI:Nextcloud-BAK-13-Sep-18

lxc start EI:Nextcloud-BAK-13-Sep-18

lxc list EI: | grep Nextcloud-BAK-Sep-13-Sep-18

And we got this at the terminal (a little time later…)

| Nextcloud-BAK-13-Sep-18 | RUNNING | (eth0) | | PERSISTENT | 0 |

Note that this is a 138GB file.  Not small by any standard.  It holds every single file that’s important to our business (server-side AND end-to-end encrypted of course).  That’s a big file-copy.  So even at LAN speed, this gave us enough time to make some really good coffee!

So we then modified our front-end haproxy server to redirect traffic intended for our primary cloud-file server to this lxc instance instead. (Two minor changes to a config, replacing the IP address of the current cloud to the new cloud).  Then we restarted our proxy server and….sharp intake of breath…


Almost unbelievably, our entire public-facing cloud server was now running on another machine (just a few feet away as it happens).   We hoped for this, but we really did not expect a 138GB file to copy and startup first time.  #WOW

We need to test and work this instance to death to make sure it’s every bit as SOUND as our primary server, which is now back online and this backup version is just sleeping.

Note that this is a complete working copy of our entire cloud infrastructure – the Nextcloud software, every single file, all the HTTPS: certs, databases, configurations, OS – everything.  A user changes NOTHING to access this site, in fact, it’s not even possible for them to know it’s any different.

We think this is amazing, and is a great reflection of the abilities of lxc, which is why we are such big fans,

With this set-up, we could create working copies of our servers in another geo-location every, say, month, or maybe even every week (once a day is too much for a geo-remote facility – 138GB for this one server over the intenet?  Yikes).

So yes, bandwidth needed IS significant, and thus you can’t flash the larger server images over the internet every day, but it does provide for a very resistant disaster-recovery situation: if our premises go up in a Tornado, we can be back online with just a few clicks from a web-browser (change DNS settings and maybe a router setting or two) and issue a few commands from an ssh terminal, connected to the backup facility.

We will develop a proper, sensible strategy for using this technique after we have tested it extensively, but for now, we are happy it works.  It gives us another level of redundancy for our updating and backup processes.


Image result for love LXD


We deploy LXC containers in quite literally ALL of our production services.  And we have several development services, ALL of which are in LXC containers.

Why is that?  Why do we use LXC?

The truthful answer is “because we are not Linux experts”.  It really is true.  Almost embarrassingly so in fact.  Truth is, the SUDO command scares us: it’s SO POWERFUL that you can even brick a device with it (we know.  We did it).

We have tried to use single machines to host services.  It takes very little resources to run a Linux server, and even today’s laptops have more than enough hardware to support even a mid-size business (and we are not even mid-size).  The problem we faced was that whenever we tried “sudo” commands in Linux Ubuntu, something at sometime would go wrong – and we were always left wondering if we had somehow created a security weakness, or some other deficiency.  Damn you, SuperUser, for giving us the ability to break a machine in so many ways.

We kept re-installing the Linux OS on the machines and re-trying until we were exhausted.  We just could not feel COMFORTABLE messing around with an OS that was already busy dealing with the pervasive malware and hacker threats, without us unwittingly screwing up the system in new and novel ways.

And that’s when the light went on.  We thought: what if we could type in commands without worrying about consequences?  A world where anything goes at the command line is…heaven…for those that don’t know everything there is to know about Linux (which of course definitely includes us!).  On that day, we (re-) discovered “virtual machines”.  And LXC is, in our view, the BEST, if you are running a linux server.

LXC allows us to create virtual machines that use fewer resources than the host; machines that run as fast as bare-metal servers (actually, we have measured them to be even FASTER!).  But more than that, LXC with its incredibly powerful “snapshot” capability allows us to play GOD at the command line, and not worry about the consequences.

Because of LXC, we explore new capabilities all the time – looking at this new opensource project, or that new capability.  And we ALWAYS ALWAYS run it in an unprivileged LXC container (even if we have to work at it) because we can then sleep at night.

We found the following blog INCREDIBLY USEFUL – it inspired us to use LXC, and it gives “us mortals” here at Exploinsights, Inc. more than enough information to get started and become courageous with a command line!  And in our case, we have never looked back.  We ONLY EVER use LXC for our production services:

LXC 1.0: Blog post series [0/10]

We thank #UBUNTU and we thank #Stéphane Graber for the excellent LXC and the excellent development/tutorials respectively.

If you have EVER struggled to use Linux.  If the command line with “sudo” scares you (as it really should).  If you want god-like forgiveness for your efforts to create linux-based services (which are BRILLIANT when done right) then do yourself a favor: check out LXC at the above links on a clean Ubuntu server install.  (And no, we don’t get paid to say that).

We use LXC to run our own Nextcloud server (a life saver in our industry).  We operate TWO web sites (each in their own container), a superb self-hosted OnlyOffice document server and a front-end web proxy that sends the traffic to the right place.  Every service is self-CONTAINED in an LXC container.  No worries!

Other forms of virtualisation are also good, probably.  But if you know of anything as FAST and as GOOD as LXC…then, well, we are surprised and delighted for you.


SysAdmin ([email protected])




Interactively Updating LXC containers

We love our LXC containers.  They make it so easy to provide and update services – snapshots take most of the fear out of the process, as we have discussed previously here.  But even so, we are instinctively lazy and are always looking for ways to make updates EASIER.  Now it’s possible to fully automate the updating of a running service in an LXC container BUT a responsible administrator wants to know what’s going on when key applications are being updated.  We created a compromise, a simple script that runs an interactive process to backup and update our containers.  It saves us repetitively typing the same commands, but it still keeps us fully in control as we answer yes/no upgrade related questions.  We thought our script is worth sharing.  So, without further ado, here’s our script, which you can just copy and paste to a file in your home directory (called say ‘’).  Then just run the script when you want to update and upgrade your containers.  Don’t forget to change the name(s) of your linux containers in the ‘names=…’ line of the script:

# Simple Container Update Script
# Interactively update lxc containers

# Change THESE ENTRIES with container names and remote path:
names='container-name-1 c-name2 c-name3 name4 nameN'

# Now we just run a loop until all containers are backed up & updated
for name in $names
echo ""
echo "Creating a Snapshot of container:" $name
lxc snapshot $name
echo "Updating container:" $name
lxc exec $name apt update
lxc exec $name apt upgrade
lxc exec $name apt autoremove
echo "Container updated. Re-starting..."
lxc restart $name
echo ""
echo "All containers updated"

Also, after you save it, don’t forget to chmod the file if you run it as a regular script:

chmod +x

Now run the script:


Note – no need to run using ‘sudo’ i.e. as ROOT user- this is LXC, we like to be run with minimal privileges so as not to ever break anything important!

So this simple script, which runs in Ubuntu or equivalex distro, does the following INTERACTIVELY for every container you name:

lxc snapshot container #Make a full backup, in case the update fails
apt update             #Update the repositories
apt upgrade            #Upgrade everything possible
apt autoremove         #Free up space by deleting old files 
restart container      #Make changes take effect

This process is repeated for every container that is named.  The ‘lxc snapshot’ is very useful: sometimes an ‘apt upgrade’ breaks the system.  In our case, we can then EASILY restore the container to its prior updated state using the ‘lxc restore command.  All you have to do is firstly find out a containers snapshot name:

lxc info container-name

E.g. – here’s the output of ‘lxc info’ on one of our real live containers:

sysadmin@server1:~lxc info office

Name: office
Remote: unix://
Architecture: x86_64
Created: 2018/07/24 07:02 UTC
Status: Running
Type: persistent
Profiles: default
Pid: 21139
eth0: inet
eth0: inet6 fe80::216:3eff:feab:4453
lo: inet
lo: inet6 ::1
Processes: 198
Disk usage:
root: 1.71GB
Memory usage:
Memory (current): 301.40MB
Memory (peak): 376.90MB
Network usage:
Bytes received: 2.52MB
Bytes sent: 1.12MB
Packets received: 32258
Packets sent: 16224
Bytes received: 2.81MB
Bytes sent: 2.81MB
Packets received: 18614
Packets sent: 18614
http-works (taken at 2018/07/24 07:07 UTC) (stateless)
https-works (taken at 2018/07/24 09:59 UTC) (stateless)
snap1 (taken at 2018/08/07 07:37 UTC) (stateless)

The snapshots are listed at the end of the info screen.  This container has three: the most recent being called ‘snap1’.  We can restore our container to that state by issuing:

lxc restore office snap1

…and then we have our container back just where it was before we tried (and hypothetically failed) to update it.   So we could do more investigating to find out what’s breaking and then take corrective action.

The rest of the script is boiler-plate linux updating on Ubuntu, but it’s interactive in that you still have to accept proposed upgrade(s) – we call that “responsible upgrading”.  Finally, each container is restarted so that the correct changes are propagated.  This gives a BRIEF downtime of each container (typically 1-several seconds).  Don’t do this if you cannot afford even a few seconds of downtime.

We run this script manually once a week or so, and it makes the whole container update process less-painful and thus EASIER.

Happy LXC container updating!

Installing OnlyOffice Document Server in an Ubuntu 16.04 LXC Container

In our quest to migrate away from the relentlessly privacy-mining Microsoft products, we have discovered ‘OnlyOffice’ – a very Microsoft-compatible document editing suite.  Onlyoffice have Desktop and server-based versions, including an Open Source self-hosted version, which scratches a LOT of itches for Exploinsights, Inc for NIST-800-171 compliance and data-residency requirements.

If you’ve ever tried to install the open-source self-hosted OnlyOffice document server (e.g. using the official installation instructions here) you may find it’s not as simple as you’d like.  Firstly, per the official instructions, the onlyoffice server needs to be installed on a separate machine.  You can of course use a dedicated server, but we found that for our application, this is a poor use of resources as our usage is relatively low (so why have a physical machine sitting idly for most of the time?).  If you try to install onlyoffice on a machine with other services to try to better utilise your hardware, you can quickly find all kinds of conflicts, as the onlyoffice server uses multiple services to function and things can get messed up very quickly, breaking a LOT of functionality on what could well be a critical asset you were using (before you broke it!).

Clearly, a good compromise is to use a Virtual Machine – and we like those a LOT here at Exploinsights, Inc.  Our preferred form of virtualisation is LXD/LXC because of performance – LXC is blindingly fast, so it minimizes user-experience lag issues.  There is however no official documentation for installing onlyoffice in an lxc container, and although it turns out to be not straightforward, it IS possible – and quite easy once you work through the issues.

This article is to help guide those who want to install onlyoffice document server in an LXC container, running under Ubuntu 16.04.  We have this running on a System76 Lemur Laptop.  The onlyoffice service is resource heavy, so you need a good supply of memory, cpu power and disk space.  We assume you have these covered.  For the record, the base OS we are running our lxc containers in is Ubuntu 16.04 server.


You need a dns name for this service – a subdomain of your main url is fine.  So if you own “”, a good server name could be “”.  Obviously you need dns records to point to the server we are about to create.  Also, your network router or reverse proxy needs to be configured to direct traffic for ports 80 and 443 to your soon-to-be-created onlyoffice server.


Create and launch a container, then enter the container to execute commands:

lxc launch ubuntu:16.04 onlyoffice
lxc exec onlyoffice bash

Now let’s start the installation.  Firstly, a mandatory update (follow any prompts that ask permission to install update(s)):

apt update && apt upgrade && apt autoremove

Then restart the container to make sure all changes take effect:

exit                     #Leave the container
lxc restart onlyoffice   #Restart it
lxc exec onlyoffice bash #Re-enter the container

Now, we must add an entry to the /etc/hosts file (lxc should really do this for us, but it doesn’t, and only office will not work unless we do this):

nano /etc/hosts  #edit the file

Adjust your file to change from something like this: localhost

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

To (see bold entry): localhost

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

save and quit.  Now we install postgresql:

apt install postgresql

Now we have to do somethings a little differently than at a regular command line because we operate as a root user in lxc.  So we can create the database using these commands:

su - postgres

Then type:

CREATE USER onlyoffice WITH password 'onlyoffice';
GRANT ALL privileges ON DATABASE onlyoffice TO onlyoffice;

We should now have a database created and ready for use.  Now this:

curl -sL | bash -
apt install nodejs
apt install redis-server rabbitmq-server
echo "deb squeeze main" |  tee /etc/apt/sources.list.d/onlyoffice.list
apt-key adv --keyserver hkp:// --recv-keys CB2DE8E5
apt update

We are now ready to install the document server.  This is an EXCELLENT TIME to take  a snapshot of the lxc container:

lxc snapshot onlyoffice pre-server-install

This creates a snapshot that we can EASILY restore another day.  And sadly, we probably have to as we have yet to find a way of UPDATING an existing document-server instance, so whenever onlyoffice release an update, we repeat the installation from this point forward after restoring the container configuration.

Let’s continue with the installation:

apt install onlyoffice-documentserver

You will be asked to enter the credentials for the database during the install.  Type the following and press enter:


Once this is done, if you access your web site (i.e. your version of ‘’) you should see the following screen:

We now have a document server running, albeit in http mode only.  This is not good enough, we need to use SSL/TLS to make our server safe from eavesdroppers.  There’s a FREE way to do this using the EXCELLENT LetsEncrypt service, and this is how we do that:

Back to the command line in our lxc container.  Edit this file:

nano /etc/nginx/conf.d/onlyoffice-documentserver.conf

Delete everything there and change it to the following (changing your domain name accordingly):

include /etc/nginx/includes/onlyoffice-http.conf;
server {
  listen [::]:80 default_server;
  server_tokens off;

  include /etc/nginx/includes/onlyoffice-documentserver-*.conf;

  location ~ /.well-known/acme-challenge {
        root /var/www/onlyoffice/;
        allow all;

Save and quit the editor.  Then exeute:

systemctl reload nginx
apt install letsencrypt

And then this, changing the email address and domain name to yours:

letsencrypt certonly --webroot --agree-tos --email [email protected] -d -w /var/www/onlyoffice/

Now, we have to re-edit the nginx file”

nano /etc/nginx/conf.d/onlyoffice-documentserver.conf

…and replace the contents with the text below, changing all the bold items to your specific credentials:

include /etc/nginx/includes/onlyoffice-http.conf;
## Normal HTTP host
server {
  listen [::]:80 default_server;
  server_tokens off;
  ## Redirects all traffic to the HTTPS host
  root /nowhere; ## root doesn't have to be a valid path since we are redirecting
  rewrite ^ https://$host$request_uri? permanent;
#HTTP host for internal services
server {
  listen [::1]:80;
  server_name localhost;
  server_tokens off;
  include /etc/nginx/includes/onlyoffice-documentserver-common.conf;
  include /etc/nginx/includes/onlyoffice-documentserver-docservice.conf;
## HTTPS host
server {
  listen ssl;
  listen [::]:443 ssl default_server;
  server_tokens off;
  root /usr/share/nginx/html;
  ssl_certificate /etc/letsencrypt/live/;
  ssl_certificate_key /etc/letsencrypt/live/;

  # modern configuration. tweak to your needs.
  ssl_protocols TLSv1.2;
  ssl_prefer_server_ciphers on;

  # HSTS (ngx_http_headers_module is required) (15768000 seconds = 6 months)
  add_header Strict-Transport-Security max-age=15768000;

  ssl_session_cache builtin:1000 shared:SSL:10m;
  # add_header X-Frame-Options SAMEORIGIN;
  add_header X-Content-Type-Options nosniff;
  # ssl_stapling on;
  # ssl_stapling_verify on;
  # ssl_trusted_certificate /etc/nginx/ssl/stapling.trusted.crt;
  # resolver valid=300s; # Can change to your DNS resolver if desired
  # resolver_timeout 10s;
  ## [Optional] Generate a stronger DHE parameter:
  ##   cd /etc/ssl/certs
  ##   sudo openssl dhparam -out dhparam.pem 4096
  #ssl_dhparam {{SSL_DHPARAM_PATH}};

  location ~ /.well-known/acme-challenge {
     root /var/www/onlyoffice/;
     allow all;
  include /etc/nginx/includes/onlyoffice-documentserver-*.conf;

Save the file, then reload nginx:

systemctl reload nginx

Navigate back to your web page and you should get the following now:

And if you do indeed see that screen then you now have a fully operational self-hosted OnlyOffice document server.

If you use these instructions, please let us know how it goes.  In a future article, we will show you how to update the container from the snapshot we created earlier.






Server Backups using LXD

So I am working the process of server backups today.  Most people do backups wrong, and I have been guilty of that too.  You know it’s true when you accidentally delete a file, and you think ‘No worries, I’ll restore it from a backup…’; and about an hour later of opening archives and trying to extract the one file but finding some issue or other…makes you realize your backup strategy sucks.  I am thus trying to do get this right from the get-go today:
LXD makes the process easy (albeit with a few quirks).  EXPLOINSIGHTS Inc. (EI) servers are structured such that each service is running in an LXD container.  Today, there are several active, ‘production’ servers (plus several developmental servers, which are ignored in this posting):

  • Nextcloud – cloud file storage;
  • WordPress – this web-site/blog;
  • Onlyoffice – an ‘OnlyOffice’ document server;
  • Haproxy – the front-end server that routes traffic across the LAN

All of these services are running on one physical device.  They are important to EI as customers access these servers (whether they know it or not), so they need to JUST WORK.
What can I do if the single device (‘server1’) running these services just dies?  well I have battery backup, so a power glitch won’t do it.  Check.  And the modem/router are also UPS charged, so connectivity is good.  Check.  I don’t have RAID on the device but I do have new HD’s – low risk (but not great).  Half-check there.  And if the device hardware just crashes and burns just because it can…well that’s what I want to fix today:
So my way of creating functionally useful backups is to do the following, as a simple loop in a script file:

  1. For each <container name> on server1:
    1. lxc stop <container-name>
    2. lxc copy <container-name> TO server2:<container-name##>
    3. lxc restart <container-name>
  2. Next <container-name>

The ‘##’ at the end of the lxc copy command is the week-number, so I can create weekly container backups EASILY and store them on server2.  I had hoped to do this without stopping the containers, but the criu LXD add-on program (which is supposed to provide that very capability) is not performing properly on server2, so I have a brief server-outage when I run this script for each service for now.  I thus have to try to run this at “quite times”, if such a thing exists; but I can live with that for now.
I did a dry-run today: I executed the script, then I stopped two of the production containers.  I then launched the backup containers with the command:

  • lxc start <container-name##>

I then edited the LAN addresses for these services and I was operational again IN MINUTES.  The only user-experience change I noticed was my login credentials expired, but other than that it was exactly the same experience “as a user”.  Just awesome!
Such a strategy is of no use if you need 100% up-time, but this works for EI for now until I develop something better.  And to be clear, this solution is still far from perfect so it’s always going to be a work in progress:-
Residual risks include:

  1. Both servers are on same premises, so e.g. fire or theft risks are not covered;
    1. Really hard to fix this because of data residency and control requirements.
  2. This strategy requires human intervention to launch the backup servers, so there could be considerable downtime.  Starting a backup lxd container for the haproxy server will also require changes at the router (this one container receives and routes all http and https traffic except ssh/vpn connections.  The LAN router presently sends everything to this server.  A backup container will have a different LAN IP address thus router reconfig is needed);
  3. The cloud file storage container is not small – about 13GB today.  52 weeks of those will have a notable impact on storage at server2 (but storage is cheap);
  4. I still have to routinely check that the backup containers actually WORK (so practice drills are needed);
  5. I have to manually add new production containers to my script – easy to forget;
  6. I don’t like scheduled downtime for the servers…

But overall, today, I am satisfied with this approach.  The backup script will be placed in a cron file for auto-execution weekly.  I may make my script a bit more friendly by sending log files and/or email notification etc., but for now a manual check-up on backup status will suffice.