Skip to main content

Linux Security: seccomp, and its usage in Android and Docker


seccomp is short for SECure COMPuting. It sounds like a quite broad techniques but actually its scope is quite narrow, but effective. Simply put, it is a default deny white-list firewall used by kernel to restricting what syscalls a process can make.
seccomp is widely used lots of popular systems to sandbox the processes and/to reduce the kernel attacking surface, notably Chromium, Android and Docker.

How it works

We mentioned previously seccomp fundamentally is a white-list that kernel will check again for each process where a particular process are allowed to call a certain system call.
Technically, the white-list is written using Berkeley Packet Filter (BPF) rules, which will then be passed to seccomp system call.
Writing the rules using BPF and isn't intuitively for most programmers, so there are different wrappers making it more user friendly. Android use minijail, which is actually come from Chromium. Docker has golang wrapper, where you can write the profile in json format.
We'll see how they are used in practice.

seccomp in Android

Each process or service will have a seccomp policy defined by Android. minijail is the helper library used to parse the policy file and pass it to the kernel.
Below we'll see in detail how seccomp is used for mediaextractor service. Let's jump directly to the code:
#mediaextractor/minijail/minijail.cpp
static const char kSeccompFilePath[] = 
    "/system/etc/seccomp_policy/mediaextractor-seccomp.policy";
int MiniJail()
{
    struct minijail *jail = minijail_new();
    minijail_no_new_privs(jail);
    minijail_log_seccomp_filter_failures(jail);
    minijail_use_seccomp_filter(jail);
    minijail_parse_seccomp_filters(jail, kSeccompFilePath);
    minijail_enter(jail);
    minijail_destroy(jail);
    return 0;
}
It is quite straightforward, thanks to the very self explanatory function name and the great analogy (minijail) used here.
We first create a minijail, parse policy (converting into the BPF filter), and finally enter the jail (calling seccomp system call) (so called enter the jail).
A peek of format/content of the mediaextractor-seccomp.policy makes things clearer - it lists all the syscalls that are allowed in the target process.
ioctl: 1
futex: 1
prctl: 1
write: 1
getpriority: 1
mmap2: 1
close: 1
10munmap: 1
dupe: 1
mprotect: 1
getuid32: 1
setpriority: 1

seccomp in Docker

seccomp was introduced to Docker after v1.0. A seccomp profile can be specified at docker run time using -security-opt seccomp=.jsonparameters, when docker create or docker create.
docker run -it --rm --security-opt seccomp=.json alpine sh ...
If no seccomp profile is not specified, a default profile will be used. With the default profile, 40+ system calls out of 300+ are disabled to ensure a moderate protection. The secure profile is in JSON format, which will be converted to the BPF filter by Docker daemon, and then apply to the created process/container.
The applications packaged in the Docker can only allowed to call the system calls listed in the seccomp profile you specified, giving you more power to control the security aspect of the container.

summary

In this article, we discussed what is seccomp and how it used by Android and Docker to build a securer system.

Comments

Popular posts from this blog

Understand Container - Index Page

This is an index page to a series of 8 articles on container implementation. OCI Specification Linux Namespaces Linux Cgroup Linux Capability Mount and Jail User and Root Network and Hook Network and CNI
Update:
This page has a very good page view after being created. Then I was thinking if anyone would be interested in a more polished, extended, and easier to read version.
So I started a book called "understand container". Let me know if you will be interested in the work by subscribing here and I'll send the first draft version which will include all the 8 articles here. The free subscription will end at 31th, Oct, 2018.

* Remember to click "Share email with author (optional)", so that I can send the book to your email directly. 

Cheers,


Android Camera2 API Explained

Compared with the old camera API, the Camera2 API introduced in the L is a lot more complex: more than ten classes are involved, calls (almost always) are asynchronized, plus lots of capture controls and meta data that you feel confused about.

Understand Container: OCI Specification

OCI is the industry collaborated effort to define an open containers specifications regarding container format and runtime - that is the official tone and is true. The history of how it comes to where it stands today from the initial disagreement is a very interesting story or case study regarding open source business model and competition.

But past is past, nowadays, OCI is non-argumentable THE container standard, IMO, as we'll see later in the article it is adopted by most of the mainstream container implementation, including docker, and container orchestration system, such as kubernetes, Plus, it is particularly helpful to anyone trying to understand how the containers works internally. Open source code are awesome but it is double awesome with high quality documentation!