Let me start by saying “a database without backups is not a database“. This article will compare different options for running a production database on Kubernetes and in fact, will focus on doing automatic backup (and restore).
In this guide I will focus on PostgreSQL, because it’s a popular and super solid database. But what is written here applies to other databases too.
In this article we are going to evaluate using KubeDB for running and operating a production-ready database on Kubernetes.
As part two, I’ll also explain how to install and use some of the Kubernetes operators, as introduced in this article.
For the backups you have a couple of options:
- Write your own backup script
- Make snapshots of the database volume
- Use one of the Kubernetes operators to automate backups
Write your own backup script
When you first think about it, it sounds quite easy… Just run a pod on a cronjob, in that pod connect to the database and do
pg_dump > backup.sql. and then
s3cmd put backup.sql s3:host/backups This works on day one. But the problem happens when you want to store multiple backups; let’s say for 10 days. Now you need to develop a script that gives each backup a unique timestamp and checks the age of the (remote) backup and deletes the old ones. 😣 And then you want monitoring to check it didn’t fail 🧐. Before you know it you’re developing real software in bash or python.
And do you know how to restore the backup when needed, did you test it? And finally; did you encrypt your backups? Or is this a security liability?
Make snapshots of the database volume
This is quite a different approach and is based on the idea of simply making a full copy (or snapshot) of the disk the database files are stored on. PostgreSQL documentation describes this as inferior because inconsistencies may occur, especially if the database was not shut down before snapshotting. But shutting down the DB is not always an option.
Portworx describes this approach in detail here: How to backup and restore PostgreSQL on Kubernetes – Portworx. Pricing starts at $ 0.20 / hr / node, minimum 1000 hours (comes out to $ 200 / mo)
Use one of the Kubernetes operators to automate backups
Many organizations have been in your position before you. And some have realized there is also a potential opportunity here to create something better. Introducing: Kubernetes (database) operators*
Kubernetes operators are a pattern for software developed to take actions, based on specifications created as Kubernetes Custom Resource Definitions [link]. This works similar to other things in Kubernetes;
In short, you define the desired state in a yaml document and then save this document to the Kubernetes data store. In Kubernetes there is an event loop that checks everything that is in there and tries to make the reality the same as the desired state.
A Kubernetes operator then should be seen as a program that runs on Kubernetes but extends the kinds of things it can do. Look for documents that define a database for example and create one to match the document. It is a really powerful pattern that can do a lot of automation for you.