Persistent data in K8s-configuration

6 min readNov 25, 2022

The storage requirements

Consider a case where you have a MySQL DB pod that your application uses. Data gets added, and updated in the DB, maybe you created a new DB with a new user, etc. But by default when you restart the pod all those changes will be gone because Kubernetes doesn't give you data persistent out of the box. That's something that you as CKA have to explicitly configure for each application that needs saving data between port restarts. You need storage that doesn't depend on the pod life cycle so it will still be there when a pod dies and a new one gets created, so the new pod can pick up where the previous one left. The new pod will read the existing data from that storage to get up-to-date data. The problem is that you don't know on which node the new pod restarts which means your storage must be available to all nodes, not just a specific one. When the new pod tries to read the existing data the up-to-date data will be present on any node of the cluster. This means you need highly available storage that will survive even if the whole cluster is gone. These are the requirements for your storage for your DB will need to have to be reliable. Another use case for persistent storage is a directory, for example, an application that writes and reads files from a pre-configured directory. For the files use case and the DB use case you should use the Kubernetes component called persistent volume.

Persistent Volume is a cluster resource just like RAM or CPU that is used to store data. Just like any other component in Kubernetes, a persistent volume gets created using a YAML file where you can specify the kind which is persistent volume.

As you can see in YAML below the Kind define the PV and some other parameters like capacity and volume type.

The PV is an abstraction definition and the storage must use physical storage like local HD, external NFS, or block storage like EBS. So how does configuring the storage backend and how it's exposed and used? Your storage needs to be available and maintained doesn't matter if it's using Kubernetes or other systems, regular backups and handling any corruption are a must to keep data consistent.

Let's see how it's defined in the PV YAML file

You can see the storage backend (NFS) I set for use. In the official Kubernetes documentation you can see the complete list of available volume types from different cloud vendors, CSI interface, Openstack, fiber channel, and many more.

Note, PV’s are available to the whole cluster and don't act as namespaces like pods and services that are tied to ns.

Local vs Remote

It's important to distinguish between 2 types of volumes remote and local. Each has its use cases. The local volume types are in contradiction to the requirements I mention at the beginning which are not tied to a specific node. Because you don't know where the new pod will start and second is the case of cluster crashes. So you should always use remote storage.

How is responsible for the storage

There are usually 2 user types that use the storage, the Kubernetes admin and the developer. The Kubernetes admin makes sure the storage is up and running and available for the developer that deploys his application and consumes storage as a resource. The developers have to explicitly configure their YAML files to use the storage resources, in other words, applications have to claim the volume storage done by another Kubernetes component called PVC — Persistent Volume Claim. Another YAML file.

In this YAML you can see the Kind PVC and the desired request set by the developer 10G that he claims. And access mode is for many nodes. Once the claim applies what ever PV meets the claim requirements is bounds to it.

As you can see the pod YAML includes the PVC reference name created by the developer.

To summarize the process so far: the pod requested the volume through the PV claim. Claim tries to find a volume in the cluster and find the one that has the resources for it. Note that the claim must be in the same namespace as the pod uses. Once the process succeeds the volume is mounted into the pod using the mountPath defined in the pod YAML file and then mounted into the container as shown in the YAML above at the container part. And that's where the magic happens when the pod dies a new pod is created with a container, reads the YAML file, and uses the define mountPath and its data.

Why so many abstractions?

You may ask yourself why so many abstractions that need to be configured just to use persistent data. Well, that's the beauty of Kubernetes, as a developer, I don't care about the storage location I just set my claim and deploy it. The liberty that the developer gets releases him from any infrastructure settings, yes you are right its sounds like PaaS and it is.

The exceptional volumes

Two volume types need to be mentioned and created by Kubernetes itself they are different from the rest: cm -configMap and secrets. They are both local volumes and unlike the rest are not created by PV and PVC. For example, if you need a configured file for your Prometheus, message broker, or a certificate file all mounted inside your application in both cases you will need a file available to your pod. To make it work you create a ConfigMap or secret component and mount them into your pod and to your container (same as in the examples above) the same way you would mount PVC. Note, a pod can use different types of volumes at the same time.

Last but not least, let's consider a cluster with hundreds of pods being deployed every day and the storage that is needed to support it. In such a scenario it could be that developers will need to approach admin and ask for storage from a cloud or on-prem storage (depending on the customer infra) to handle it there is another storage component called StorageClass, which created and provisions PVs dynamically whenever PVC claims it. As with all other components, StorageClass is also created via the YAML file

As you see in the YAML for storage class Kind defines it. The provider is defined and the provisioner tells Kubernetes which provider to use, and also the specific storage parameters, defining the storage parameters is because storage should be a reactive system, meaning reacting to various applications' needs. You can define several storage providers that developers can use with their parameters. This makes the storage class a central storage manifest apps that could select and use.

Persistent data in K8s-configuration

Written by Amit Cohen

No responses yet