Skip to main content

Understand Kubernetes 1: Container Orchestration

By far, we know the benefits of the container and how the container is implemented using Linux primitives.
If we only need to one or two containers, we should be satisfied. That's all we need. But if we want to run dozens or thousands containers to build a stable and scalable web service that is able to server millions transaction per seconds, we have more problems to solve. To name a few:
  • scheduling: Which host to put a container?
  • update: How to update the container image and ensure zero downtime?
  • self-healing: How to detect and restart a container when it is down?
  • scaling: How to add more containers when more processing capacity is needed?
None of those issues are new but only the subject become containers, rather than physical servers (in the old days), or virtual machines as recently. The functionalities described above are usually referred as Container Orchestration.

Kubernetes

kubernetes, abbreviated as k8s, is one of many container orchestration solutions. But, as of mid-2018, many would agree the competition is over; k8s is the de facto standard. I think it is a good news, freeing you from the hassle of picking from many options and worrying about investing in the wrong one. K8s is completely open source, with a variety of contributors from big companies to individual contributors.
k8s has a very good documentation, mostly here and here.
In this article, we'll take a different perspective. Instead of starting with how to use the tools, we'll start with the very object k8s platform is trying to manage - the container. We'll try to see what extra things k8s can do, compare with single machine container runtime such as runc or docker, and how k8s integrate with those container runtimes.
However, we can't do that without an understanding of the high-level architecture of k8s.

At the highest level, k8s is a master and slave architecture, with a master node controlling multiple slave or work nodes. master & slave nodes together are called a k8s clusterUser talks to the cluster using API, which is served by the master. We intentionally left the master node diagram empty, with a focus on the how the things are connected on the work node.
Master talks to work nodes through kublet, which primarily run and stop Pods, through CRI, which is connected to a container runtime. kublet also monitor Pods for liveness and pulling debug information and logs.
We'll go over the components in a little more detail below.

Nodes

There are two type of nodes, master node and slave node. A node can either be a physical machine or virtual machine.
You can jam the whole k8s cluster into a single machine, such as using minikube.

Kubelet

Each work note has a kubelet, it is the agent that enables the master node talk to the slaves.
The responsibility of kubelet includes:
  • Creating/running the Pod
  • Probe Pods
  • Monitor Nodes/Pod
  • etc.
We can go nowhere without first introducing Pod.

Pod

In k8s, the smaller scheduling or deployment unit is Pod, not container. But there shouldn't be any cognitive overhead if you already know containers well. The benefits of Pod is to add another wrap on top of the container to make sure closely coupled contains are guaranteed end up being scheduled on the same host so that they can share a volume or network that would otherwise difficult or inefficient to implement if they being on different hosts.
A pod is a group of one or more containers, with shared storage and network, and a specification for how to run the containers. A pod’s contents are always co-located and co-scheduled and run in a shared context, such as namespaces and cgroups.
For details, you can find here.

Config, Scheduing and Run Pod

You config a Pod using ymal file, call it spec. As you can imagine, the Pod spec will include configurations for each container, which includes the image and the runtime configuration.
With this spec, the k8s will sure pull the image and run the container, just as you would do using simple docker command. Nothing quite innovative here.
What missing here is in the spec we'll describe the resource requirement for the containers/Pod, and the k8s will use that information along with current cluster status, find a suitable host for the host. This is called Pod scheduling. The functionality and effectiveness of the schedule may be overlooked, in the borg paper, it is mentioned a better schedule actually could save millions of dollar for in google scale.
In the spec, we can also specify the Liveness and Readiness Probes.

Probe Pods

The kubelet uses liveness probes to know when to restart a container, and readiness probes to know when a container is ready to start accepting traffic. The first is the foundation for self-healing and the second for load balancing.
Without k8s, you have to do all these by your owner. Time and $$ saved.

Container Runtime: CRI

k8s isn't binding to a particular container runtime, instead, it defines an interface for image management and container runtime. Anyone one implemented the interface can be plugged into the k8s, be more accurate, the kubelet.
There are multiple implementations of CRI. Docker has cri-contained that plugs the containd/docker into the kubelet. cri-o is another implementation, which wraps runc for the container runtime service and wraps a bunch of other libraries for the image service. Both use cni for the network setup.
Assuming a Pod/Container is assigned to a particular node, and the kubelet on that node will operate as follows:
kubeletkubeletcri clientcri clientcri servercri serverimage serviceimage serviceruntime service(runc)runtime service(runc)run containercreate (over gPRC)pull image from a registryunpack the image and create rootfscreate runtime config (config.json) using the pod specrun container

Summary

We go through why we need a container orchestration system, and then the high-level architecture of k8s, with a focus on the components in the work node and its integration with container runtime.

Popular posts from this blog

Understand Container - Index Page

This is an index page to a series of 8 articles on container implementation. OCI Specification Linux Namespaces Linux Cgroup Linux Capability Mount and Jail User and Root Network and Hook Network and CNI
Update:
This page has a very good page view after being created. Then I was thinking if anyone would be interested in a more polished, extended, and easier to read version.
So I started a book called "understand container". Let me know if you will be interested in the work by subscribing here and I'll send the first draft version which will include all the 8 articles here. The free subscription will end at 31th, Oct, 2018.

* Remember to click "Share email with author (optional)", so that I can send the book to your email directly. 

Cheers,


Android Camera2 API Explained

Compared with the old camera API, the Camera2 API introduced in the L is a lot more complex: more than ten classes are involved, calls (almost always) are asynchronized, plus lots of capture controls and meta data that you feel confused about.

Understand Container: OCI Specification

OCI is the industry collaborated effort to define an open containers specifications regarding container format and runtime - that is the official tone and is true. The history of how it comes to where it stands today from the initial disagreement is a very interesting story or case study regarding open source business model and competition.

But past is past, nowadays, OCI is non-argumentable THE container standard, IMO, as we'll see later in the article it is adopted by most of the mainstream container implementation, including docker, and container orchestration system, such as kubernetes, Plus, it is particularly helpful to anyone trying to understand how the containers works internally. Open source code are awesome but it is double awesome with high quality documentation!