It's safe to say without namespace, there would be no container.
This article will not be the description or overview of namespaces in linux, which you can find at here and here.
Instead, we'll get our hands dirty and see what exactly happens to the namespaces when we use the commonly used container commands, so that you appreciate the role namespaces is playing in the container technology.
There are several type of namespaces, such as pid namespace, mount namespace. In this article we focus on the change of pid namespace, other namespaces follow similar rules. And we'll use runc as the container runtime, since it is simple, have a spec, easy to change to experiment stuff and when necessary, I can point you the code. For what it worth, as I have pointed out here that docker actually is using runc as the runtime and the docker client ux is compatible with runc. So docker users should be able to feel at home even we use runc command here.
If you like, follow here to install runc and prepare a busybox runtime bundle (or container). Or, you can just read the article.
Let's get started.
Run a container
In order to get the newly created processes, we can use tool
After running the xyxy12, we should see something as below. The first column is the PID of the newly created process, which is 10123 and the process is
To get the pid namespace, we can customize the output format of
pscommand, as shown below, and the PIDNS is what we want.
The command above give us only the pid namespace the process longs to, in order to get other namespaces, we can use the /proc filesystem. We know the pid of
shprocess (running in the container) is 10123 and this pid is in container runtime pid namespaces, we can simply do following (on the host).
We have a few files here, each one represents a type of namespace. pid is the pid namespace, mnt the mount namespace. Those files are symlinks pointing to the "real" namespaces the process belongs to, thinking it as a pointer pointing to some namespace object, which are denoted by
inodenumber, which are unique in the host system. If the namespace symlink of two different process point to same
inode, they belong to the same namespaces. By default, if no new namespaces are created, they all belong the same "root" or "default" namespace.
You can also find out the namespace of
shinside of the container but need to use the pid in the container namespace, that is
10123. Some process but different pids in a different namespace, that is all pid namespaces are about. Note that it is mandatory that the
/procmust be setup properly during container creation.
Next, we can also see what are in the newly created pid namespace. Unfortunately, there isn't a place we can find out this information directly but need to go over all the /proc/
/ns files and aggregate all the pids belongs the same namespace. Luckily, tool cinf, does exactly that.
We have only one at the moment, and that is the "init" program of the container we started, the "sh" program. Ingore the cgroup at the moment, we focus on the namespace this time.
Well, we see that when a new container is created, a bunch of new namespaces will be created and the "init" process of the container will be put into those namespaces. Effectively the process is running in a container, and that mean different things for different namespaces. For pid namespaces, it means all the processes running in the container can see only the processes in the exact same processes namespace, "pid:", or equivalently "pid:xyxy12". The
shprocess is considered as PID 1 inside of the container,but it is 10123 in the host, and that's pid namespace in play here. As you can see, we actually can use the container and namespaces interchangeably in this context.
We are clear, hopefully, about what does
docker/runc rundo regarding the namespace, how about
run new process inside a container
From execsnoop, we can see the pids - in the runtime namespace.
Actually, we can use
runc ps, which will the processes running in a container, and the pids listed are in the runtime namespace, which is what we want. (One interesting difference is execsnoop say the parent of 10710 is 10709, but runc ps says it is 10702, which is the runc exec command, seems makes more sense.)
runc psdoes fully support the the
-o pid,pidnsoption. So we'll again use the
cinfto find out the namespace of the new running process (
We can see that no new namespaces were created, they are same as the namespaces for the
shprocess - the "init" process of the container
xyxy12. In other word, the
topprocess joined the namespace of "init" process - the first process we run in the container.
Let's list again the processes inside of the pid namespace 4026532572. Now, there are two now, 10123 and 10710.
And, when running ps inside of the container, we'll also see those processes (plus the ps itself), but they are not 10123 and 10710. Instead, the are 1 and 9, and that's pid namespace in play here.
Now we see that
docker/runc execactually starts the new process inside of the namespaces the container already created.
When running a container, new namespaces will be created and the "init" process will be put into that namespaces; when running a new process in a container, it will join the namespaces that are created when the container is created.
That's the "normal" case, instead of letting the container creating a namespace for the container, you can also specify a path that you want the container or processes to run in.
Now you understand how exactly PID namespaces are used in container. If you can take an extra step to figure out what is the mount namespaces and how it is used in container, then you understand the core of application containerization.