Skip to main content

Understand Container 3: Linux Capabilities

 is used to break the super privileges enjoyed by the root user to fine-grained rights (well just to avoid saying capabilities) so that even being a root user you are able to whatever you want unless been granted corresponding capabilities.

prepare a rootfs

We'll need to install some additional tool (libcap) to explore the capabilities, so here some instruction of how to prepare such a rootfs.
First, create a docker container with libcap installed,
sudo docker run -it alpine sh -c 'apk add -U libcap; capsh --print'
using docker ps -a find out the container id of the one we just run, it should be the lastest one.
Then export the rootfs to create an runc runtime bundle.
mkdir rootfs docker export $container_id | tar -C rootfs -xvf - runc spec


Using the default config.json generated from runc spec, you are not allowed to set the hostname, even being root.
$ sudo runc run xyxy67 / # id uid=0(root) gid=0(root) / # hostname cool hostname: sethostname: Operation not permitted
That's because set hostname requires CAP_SYS_ADMIN capability, even being root. We can add that capability by adding CAP_SYS_ADMIN to boundingpermittedeffective list of the capabilities attribute of the init the process.

Run another container with the new configuration, and now you are allowed to set hostname.
$ sudo runc run xyxy67 / # hostname runc / # hostname hello / # hostname hello / #
Run another command in the same container, and it will able to set hostname as well, since it inherits the capability of the init process.
$ sudo runc exec -t xyxy67 /bin/sh [sudo] password for binchen: / # hostname hello / # hostname good / # hostname good

get the capability

get the pid of the two processes in the runtime pid namespace.
$ sudo runc ps xyxy67 UID PID PPID C STIME TTY TIME CMD root 26002 25993 0 11:42 pts/0 00:00:00 /bin/sh root 26059 26051 0 11:43 pts/1 00:00:00 /bin/sh
Install pscap on host,
sudo apt-get install libcap-ng-utils
check capabilities of the running process using the pids in host namespace.
$ pscap | grep "26059\|26002" 25993 26002 root sh kill, net_bind_service, sys_admin, audit_write 26051 26059 root sh kill, net_bind_service, sys_admin, audit_write

request additional capabality

The exec can require additional caps that don't exist in the config.json.
run another container xyxy78 without the CAP_SYS_ADMIN in the config.json.

Double check it really doesn't have the CAPS.
$ sudo runc ps xyxy78 UID PID PPID C STIME TTY TIME CMD root 27385 27376 0 11:57 pts/0 00:00:00 /bin/sh $ pscap | grep 27385 27376 27385 root sh kill, net_bind_service, audit_write
Start another process in xyxy78 but with additional CAP_SYS_ADMIN capability, using --cap option.
sudo runc exec --cap CAP_SYS_ADMIN xyxyx /bin/hostname cool
Under the hood of --cap option, it is to set up the capability list for the process that will be exec-ed, just as set up those things for in the config.json for the init process.


You can use capsh explore a little bit more. Run capsh --print inside of the container.

This is the output with default config.json:
# capsh --print Current: = cap_kill,cap_net_bind_service,cap_audit_write+eip Bounding set =cap_kill,cap_net_bind_service,cap_audit_write Securebits: 00/0x0/1'b0 secure-noroot: no (unlocked) secure-no-suid-fixup: no (unlocked) secure-keep-caps: no (unlocked) uid=0(root) gid=0(root) groups=
This is the output with added CAP_SYS_ADMIN capability. Compared with former one, we can see additional cap_sys_admin+ep in the "Current" and ap_sys_admin in the "Bounding Set". The "+ep" means the preceding capabilities are in both "effective" and "permitted" list. For more information regarding the capability list, see capabilities.
# capsh --print Current: = cap_kill,cap_net_bind_service,cap_audit_write+eip cap_sys_admin+ep Bounding set =cap_kill,cap_net_bind_service,cap_sys_admin,cap_audit_write Securebits: 00/0x0/1'b0 secure-noroot: no (unlocked) secure-no-suid-fixup: no (unlocked) secure-keep-caps: no (unlocked) uid=0(root) gid=0(root) groups=


We see how Linux capability is used to limit the things a process can do and thus increase the security of the container.

Popular posts from this blog

Android Camera2 API Explained

Compared with the old camera API, the Camera2 API introduced in the L is a lot more complex: more than ten classes are involved, calls (almost always) are asynchronized, plus lots of capture controls and meta data that you feel confused about.

No worries. Let me help you out. Whenever facing a complex system need a little bit effort to understand, I usually turns to the UML class diagram to capture the big picture.

So, here is the class diagram for Camera2 API.

You are encouraged to read this Android document first and then come back to this article, with your questions. I'll expand what is said there, and list the typical steps of using camera2 API. 

1. Start from CameraManager. We use it to iterate all the cameras that are available in the system, each with a designated cameraId. Using the cameraId, we can get the properties of the specified camera device. Those properties are represented by class CameraCharacteristics. Things like "is it front or back camera", "outpu…

Java Collections Framework Cheat Sheet

Java Collections Framework (JCF) implements the Abstract Data Type  for Java platform. Every serious Java programmer should familiar himself on this topic and be able to choose the right class for specific need.  A thorough introduction to JCF is not the target of this small article and to achieve that goal you can start with this excellent tutorial . 

Instead, I'd like to
1) Provide an overview of JCF's classes ,   2) Provide a cheat sheet you can post in your cubicel for daily reference, 3) Underline the relationship between JCF's implementation and the data structure and algorithm you learned in your undergraduate course

With these goals in mind, I came up following diagram - Java Collection Cheat Sheet. You can click it to zoom in. There is no necessity for more explanation once your familiar with UML class diagram and have a basic understanding of common data structures.

Android Security: An Overview Of Application Sandbox

The Problem: Define a policy to control how various clients can access different resources. A solution: Each resource has an owner and belongs to a group.Each client has an owner but can belongs to multiple groups.Each resource has a mode stating the access permissions allowed for its owner, group members and others, respectively. In the context of operating system, or Linux specifically, the resources can be files, sockets, etc; the clients are actually processes; and we have three access permissions:read, write and execute.