Networking in Kubernetes is not as complex as you may think
The core networking concept in Kubernetes is the networking of pods and containers. How do pods create, and how the pods’ network look like? How do pods communicate on the same node, and how do they talk to cart other access nodes?
Let’s start with how a network works within the pods. As you know, in Kubernetes, the atomic unit is a pod, not a container. Considering. that a pod contains one main container for an app, some will ask why the container is abstracted with a pod if only one primary application runs inside. To explain it, we must go deep into what a pod is. Every pod has a unique IP address at its core, and that IP is reachable from all other pods in the cluster. Why is it important to have this pod with its IP address? One of the main challenges with distributed applications is allocating ports to services and applications running on servers without getting conflicts since you can always give one port once on a single host. You will face this challenge with containers because this is how container mapping works.
PostgreSQL for example
PostgreSQL where in a container Postgres port is 5432, when you start the container at your machine, you bind your host port to the app port host:5432 to look like this: [~]$ docker run -p 5000:5432 -e postgres:9.6.17 (you can change the host port if you like, let’s say 5000) and when you run it and rub docker ps you will see a map of 5000 to 5432, and the app is reachable via post 5000. let’s say I want a note Postgres container: [~]$ docker run -p 5001:5432 -e postgres:9.6.17 you will need to bind a new host (5001); that’s how the container works; the problem with this is when you have hundreds of containers running on your servers how can you keep track of what ports are still accessible on the host to bind them? Soon it will become to be an impossible task.
The solution — pod abstraction
Kubernetes solves this problem by abstracting the containers using pods; pods are like tiny machines with their IP with one main container running in it. Going back to Postgres, you might have a pod running; when a pod is created on a node, it gets its network namespace and a virtual ethernet connection to connect to the underlying infrastructure network. Sp pod is a host just like a machine; both have an IP address and a range of ports they can allocate to their containers. It means you don’t need to worry about port mappings on the server where the pod is running, only inside the pod itself when each container has its ports. Since the best practice is one container per pod, this means that, for example, if you have 10 microservices running reach in its pod and use port 8080, they will not have any conflicts. Another reason why pod abstraction is helpful is that you can easily replace the container runtime in Kubernetes; for example, if you return container runtime with a container.d runtime (strongly recommended due to its light weight), all Kubernetes configurations will stay the same because it’s all on the pod level. There is no tied up to any runtime implementation.
Still, what happens in multi containers pod?
A pod is an isolated virtual host with its network namespace, and containers inside all run in the network namespace. This means containers can talk to each other via the local host and a port number, just like when you’re running multiple apps on your machine. There are still scenarios where you do want to have multiple containers on a single, pod-like having a scheduler with its app, authentication GW, etc.
The role of the “Pause” container
A pause container resides in every pod. If a container dies and a new container is created, the pod stays and keeps its IP address. But if the pod itself passes, it gets recreated, and a new pod will get assigned with a new IP address. It is also called a “sandbox” container; its role is reserved and holds network namespaces shared by all containers in a pod; Pause container makes it possible for containers to communicate with each other.
Pod To Pod networking
Let’s go one level up and understand how pods talk to each other, you may be surprised, but Kubernetes does not have a built-in solution for this. Instead, the Kubernetes admin expects to implement a networking solution; even though Kubernetes does not include a network solution, it still has a set of rules, and those rules are CNI — container network interface. It is a similar concept to the container run time interface (you can use any run time like container.d docker, cri-o) as long as you use the interface provided by Kubernetes. So lets dige in and understand teh requierments for CNI plugin:
First, every pod should get its unique IP address that has to be unique across the whole cluster (not just the node). Second pods must talk to each other using that IP address and other pods from other nodes in the group should also speak to one another without NAT. The expectation from every network plugin is to allow all pods from all nodes in the cluster to talk to each other as if they were in the same network. Kubernetes does not define the IP CIDR block. That’s up to the network plugin to config. It is also called the Kubernetes network model, and that are many solutions out there, like flannel, cilium, NSX, weave net, calico, and more.
How does it work in practice?
Let’s start with the node; each node has its IP that belongs to the same private network; it should be your VPC in AWS, for example. A VPC has its CIDR block range that gets assigned to any node. A CIDR of 10.0.0..0/16 will provide 10.0.0.1 to the first pod and so on to 10.0.0.2 and the next one. On each node, we need to schedule pod creation; as I remember, I have mentioned that the pod on each node is isolated from its private network. On each node, a private network is created by the network plugin with a different IP address range that should not overlap with the IP address of the node, so a CIRD for pods could be 10.32.1.0/24, which is a different network than the node CIDR above. Each private network has a bridge on the host, letting all the pods within that network talk to each other. Like nodes on the same VPC, subnets can speak to each other. But how do we ensure each node gets a direct set of the IP CIDR block so pods across all nodes will have a unique IP? This is up to the network plugin to define the range for the whole cluster. So far, so good, but we still have a private unique network on each node that pods use. We want a DB pod in node 1 to talk to the app pod in node 3. Those are entirely different networks. It works using a gateway. Each pod’s private network uses the node IP address as a gateway and can find the correct route to another pod in another node. This creates one extensive pod network with all cluster nodes. Returning to the network plugin, they also solve the scalability problem. As they deploy their pod in a node, they discover and find each other and share information about which pod is running and where; this makes the pod network faster even if you have thousands of nodes.