permissionare the oldest and most basic security mechanism in Linux. Briefly here is how it works: 1) System has a number of users and groups 2) Every file belongs to an owner and a group, 3) Every process belong to a user and one or more groups, 4) lastly, to link 1,2,3 together, every file has a
modesetting that defines the permissions for three type of processes: owner, group and other. Note that the kernel knows and cares only about uid and guid, not user name and group name.
Specify the uid of a container process
User property can be used to specify under which user the process will be run as. It is optional and by default it is 0 or
root, which is required to run the
That means you can delete follow section from the default config.json and will still be able to start the container.
start the container and list the user.
As seen, it is running as
Running a container process as root is worrisome. But fortunately, by default, the container process, even being run as root, has other extra constraints (such as capability) in place, so they are usually less powerful then the root on the host which usually by default has more capability assigned.
But still, it is more secure to run the process as a non-privileged normal user, and you can do so by specifying the uid/guid as non-zero.
Let's change the uid/guid of the user config to 1000 and start the container.
It doesn't mention the username since there isn't one in the container, but from the host side (see the UID):
By default, create a container won't create a new user namespace and the uid you see in the container and on the host are the same user - i.e share the same user namespace, to say it in a fancy way.
User namespace and UID/GID mapping
Let's see what happens when using a user namespace.
Here is the user namespace before starting container with namespace support:
Making following changes to enable user namespace:
It is an error when user namespace is enabled but no uid/guid mapping; similar, GUID/GID mapping is useless and will be ignored if user namespace isn't enabled, which effectively is a wrong configuration as well.
start a container with the new config and list the user namespace in the system:
We can see we have a new user namespace (4026532450) and our new container process (sh) is running inside of it.
Inside of the container, it is running as uid/guid 0, and is considered to be root.
However, from the outside, the process is indeed considered to be running as binchen, which is 1000.
That's user namespace and uid/pid mapping in play here: uid 0 inside of the container is the 1000 on the host, a constant offset as specified in the mapping. The offset or mapping can be seen on the host by check the proc as well:
Let's do some Exercise to verify the 0 inside of the container is actually 1000 on the host, and ultimately it is 1000 that is checked by the kernel.
Inside of the rootfs but on the host, create two directories, bindir and rootdir, which are owned by the current user (id:1000) and root respectively, and are all accessible only by its owner.
Type following commands:
Here is what it should look like:
On the host, test the group and permission, The exception is the current user (binchen) is able to enter into bindir but not rootdir. After you switch to the root, the root can access not only rootdir (since root owns that dir) but also bindir (because it is root!).
To make the exercise more convincing, and let's change the uid/gid offset to 2000, so that the actual user maps to no-body on the host. And we'll expect inside of the container, the
rootcan access none of the directories since the
rootin the container is really uid 2000 and kernel won't allow it to access any of those directories.
start the container:
This is actually a great time to mention that you always have to make sure the rootfs (or runc runtime bundle) has the right permission setting that matches the user/gid mapping you want to use. The runtime won't modify the file system ownership to realize the mapping.
What's the benefit of using user namespace?
- user namespace is useful in case the process requires root to run but you won't want to give it the real root power. (Otherwise, just use a non-zero user id is fines)
- when there are multiple users (for different processes) inside of a single container, putting them in different user namespaces allows you monitoring and control multiple instances of the same container.
Don't run your container process as root user; if you have to put it into a separate user namespace.