Kubernetes ETCD Backup and Restore

Amit Cohen
4 min readDec 31, 2022

--

The cluster configuration data stored in ETCD is fundamental for the Kubernetes operation, backup, and response of it are mandatory for the continued operation of the cluster in case of a failure. Cluster failure could happen however inability to recover it is not an option.

Kubernetes state in etcd store

etcd is a key value pairs store, and just like we can and should back up any DB we should backup etcd as well. what inside the etcd store? All the Kubernetes components that we created have configurations defined with Kubernetes manifest files, they also have a state, and the state of deployment defines for example how many replicas are running and available. A state of service defines how many end-points and which endpoints it has. Revisions (history of deployments) are also stored in etcd that we can use for rollback. ConfigMap and secret data are also stored in etcd. So logging all this data means losing all cluster states. It is also important to know what is not included in etcd, and that is the application data itself. The storage that we configured with persistent volumes for the database or any status application is not in the etcd store. It is stored on the cluster node or remote storage, which needs to be managed in its way.

How to create an etcd backup?

It is not so complicated to create etcd backup as Kubernetes provides us a tool etcdctl, a command line for interacting with etcd server that has all the commands we need for backing up and restoring etcd data. etcdctl must be installed on the master node and then use for backup. installing it is simple: apt install etcd-client. etcd supports built-in snapshots, a snapshot may either be taken from a live member with the etcdctl snapshot save command or by copying the /member/snap/db file from an etcd data directory that is not currently used by an etcd process. Taking a snapshot will not affect the performance of the worker.

The command below takes the snapshot of teg exec moment of the etcd state.

ETCD_API=3 etcdctl snapshot save /tmp/etcd-backup.db

Note that etcdctl is a client tool for connecting to the etcd store but due to security reasons etcd is protected. We can't just execute the snapshot save command without credentials, we will get an authorization error. We need authentication and we do that using certificates. One way to check how API served connects to etcd server, the API server is also a client that talks to etcd server. This is solved in the kube-apiserver.yaml file using 3 files:

  1. ca.crt
  2. apiserver-etcd-client.crt
  3. apiserver-etcd-client.key

On the etcd servicer side there are also 3 files:

  1. ca.crt
  2. server.crt
  3. server.key

So the full syntax to take a snapshot to etcd will look like this:

ETCD_API=3 etcdctl snapshot save /tmp/etcd-backup.db — cacert /etc/kubernetes/pki/etcd/ca.crt — cert /tec/kubernetes/etcd/server.crt — key /etc/kubernetes/etcd/server.key

Alternative to manage etcd

You may wonder if etcd is so important don't we need to have maybe multiple replicas of them? Or can we m,manage it separately ? There are few options:

  1. Manage ectd on a remote storage outside ok Kubernetes cluster, instead of haveing hostpath with local storage we can have a remote storage configued.
  2. Run the etcd application itself outside the cluster, meaning instead of running etcd on your maste rnode you could run it on an external storage outside the cluster.

Both of the above options will need more configurations and more complex then the defulfs Kubernetes offers., but as a Kubernetes admin you should know those options exists.

Losing cluster data

Lets look at another disaster we face and this rimwe we lost all cluster configurations, i mean all data all your deployments, all services, all secrets, and configmap resources. To restore all what we have lost you want to use the backup from etcd.

Restore from etcd backup

To use the backup and strat the restore process you must create a restore point from the backup, lets see how: we use the restore command with a restore point so it will take the snapshot an dtrun it back into etcd application can read.

ETCD_API=3 etcdctl snapshot restore /tmp/etcd-backup.db — data-dir /var/lib/[you new directory for restoration]

Next step it to tell etcd about the new restore point as by default it uses the old data directory. To do that we need to edit /etc/kubernetes/manifests/etcd.yaml, the hostPath in the file that state the etcd path will need to be change to the new location when the backup has been restored to. It important to know that mountPath will stay the same as its inside the container and the hostPath in the node level and this is what we are restoring. Rememner this is where kubelet execute its commands. The first thing that happens is creating the static pods and it takes some time when you do: kubectl get pod -n kube-system it also takes time to the API server to restars, get and send calls.

Join my Linkedin

--

--

Amit Cohen
Amit Cohen

Written by Amit Cohen

A product leader with exceptional skills and strategic acumen, possessing vast expertise in cloud orchestration, cloud security, and networking.

No responses yet