Skip to main content

Understand Container: OCI Specification

OCI is the industry collaborated effort to define an open containers specifications regarding container format and runtime - that is the official tone and is true. The history of how it comes to where it stands today from the initial disagreement is a very interesting story or case study regarding open source business model and competition.

But past is past, nowadays, OCI is non-argumentable THE container standard, IMO, as we'll see later in the article it is adopted by most of the mainstream container implementation, including docker, and container orchestration system, such as kubernetes, Plus, it is particularly helpful to anyone trying to understand how the containers works internally. Open source code are awesome but it is double awesome with high quality documentation!


OCI has two specs, a Image spec and a Runtime spec. Below is the overview of what they cover and how they interact.
Image Runtime | config | runtime config layers | rootfs | | | delete | | | | | unpack | | create | start/stop/exec Image (spec) ----|-> Bundle ----------> container -------> process | | | hooks |

Image (Spec)

Image spec defines the archive format of container images, which will be unpacked to the runtime bundle from which we can run a container.
To the top level, it is just a tar ball, after untar-ed, it has a layout as below.
├── blobs │   └── sha256 │   ├── 4297f01aae8e36da1ec85e36a3cc5a4b11aa34bcaa1d88cc9ca09469826cb2bf (image.manifest) │   └── 7ea0496f252ea46535ea6932dc460cb7d82bfc86875d9d2586b6afa1e8807ad0 (image.config) ├── index.json └── oci-layout
The layout isn't that useful without a specification of what that stuff is and how they are related (referenced).

We can ignore the file oci-layout for simplicity. index.json is the entry point, it contains primary a manifest. which listed all the "resources" used by a single container image. Similar to Manifest.xml file for an Android apk.

The manifest contains primarily the config and the layers.

The config contains notably 1) configurations if the image, which can and will be converted to the runtime config file of the runtime bundle, and 2) the layers, which makes up the root file system of the runtime bundle, and 3) some metadata regarding the image history.

layers are what makes up the final rootfs. The first layer is the base, all the other layers contain only the changes to its base.
Put that into a diagram, roughly this.
index.json -----> manifest -> Config | | ref | | |-------- Layers --> [ Base, upperlayer1, upperlayer2,...]

More on Layers

A config file is just a json and is easy. So the interesting part is how to represent a file system as a layer, and how to union all the layers, as we know the layers are diffs.
  • How to represent a layer?
  • For the base layer, tar all the content;
  • For non base layers, tar the changeset compared with its base.
    Hence, first detect the change, form a changeset; and then tar the changeset, as the representation of this layer.
  • How to union all the layers?
Apply all the changesets on top of the base layer. This will give you the rootfs system.

Runtime Spec

Once the Image is unpacked to a runtime bundle on the disk file system, runtime spec will take care from there. Roughly, the job is to create a container and run the (processes in the) containers.

Container lifecycle

A container has a lifecycle, at the essence, as you can imagine, it can be model as following state diagram.

You can throw in a few other actions and states, such as pause and paused, but those are the fundamental ones.
create +---------+ start +---------+ +---------> | created| | started | | | +----------> | | +---------+ +----+----+ | v stop +---------+ +---------+ | deleted | | stopped | | | ------------+ | | +---------+ +---------+ delete
note: Somehow, the left arrow (<) will sabotage the whole diagram using my current blogspot template. I just omit it until I find a time to fix it.

Image, Container, and Processes

Containers are created from (container) Image, you can create more than one containers from a single Image, and you can repack the containers, possible with changes to base image, back to a new Image.

After you get the containers, you can run process inside of that container, without all the nice things about a container, most notably, self-contained - don't depend on the host libraries.
images container processes + + | | | | create| | +------------+ | +---------+ start | +---------+ |runtime +---------+ | created| | | started | |Bundle | | | | +----------> | | | | | +---------+ | +----+----+ +------------+ | | | | | v stop | | | +---------+ | +---------+ | | deleted | | | stopped | | | | ------------+ | | | +---------+ | +---------+ | delete | | | | |

Implementations and Ecosystems

runC is the reference implementation of the oci runtime specification. The diagram below shows its relationship with other projects, mostly with docker origin, Each entity below follows the format of org/project.

+---------------------+ | | | dockerInc/docker | | | +--------+------------+ | use +---------v-------------+ | | | moby/moby | | | +---------+-------------+ | use +-------------------+ +----------v-------------+ | | | | | oci/runtime-spec | | containerd/containerd | | | | | +---------+---------+ +----------+-------------+ ^ | | | use |impl v | +----------------------+ +---------------------+ | | | | | +---------------------|oci/runc +-----> |oic/runc/libcontainer| | | | | +----------------------+ +---------------------+
To make things looks even more crowded/flourished, throw in some kubernetes things.

CRI is the Container Runtime Interface defined by kubernetes to allows for pluggable container runtime for k8s. There are currently several implementations, among them are cri-containerd and cri-o, both are actually end up use oci/runc.
+-------------------------------| ---------------------------------------+ | | --------------+ | | k8s/CRI | | | | (container runtime interface) | | | +-------------------------------+ impl | impl | | | | | +-----+--------+ +--------+------+ |cri-containerd| |cri-o | +----------| | | | | +--------------+ +-----+---------+ k8s | | +-------------------+ +----------v-------------+ | container | | | | | | oci/runtime-spec | | containerd/containerd | | | | | | | +---------+---------+ +----------+-------------+ | ^ | |use | | use +--------------------------+ |impl v | | +---------------------++ +---------------------+ | | | | | +---------------------|oci/runc +-----> |oic/runc/libcontainer| | | | | +----------------------+ +---------------------+

That's it for today.

Popular posts from this blog

Android Security: An Overview Of Application Sandbox

The Problem : Define a  policy  to control how various  clients  can  access  different  resources . A  solution: Each  resource  has an  owner  and belongs to a  group . Each  client  has an  owner  but can belongs to multiple  groups . Each  resource  has a  mode  stating the  access permissions  allowed for its  owner ,  group  members and others, respectively. In the context of operating system, or Linux specifically, the  resources  can be files, sockets, etc; the  clients  are actually processes; and we have three  access permissions :read, write and execute.

Android Security: A walk-through of SELinux

In  DAC , each process has an owner and belong to one or several groups, and each resource will be assigned different access permission for its owner and group members. It is useful and simple. The problem is once a program gain root privileged, it can do anything. It has only three permissions which you can control, which is very coarse. SELinux is to fix that. It is much fine-grained. It has lots of permissions defined for different type of resources. It is based on the principle of default denial. We need to write rules explicitly state what a process, or a type of process (called domain in SELinux), are allowed to do. That means even root processes are contained. A malicious process belongs to no domain actually end up can do nothing at all. This is a great enhancement to the DAC based security module, and hence the name Security-Enhanced Linux, aka SELinux.

Android Camera2 API Explained

Compared with the old camera API, the Camera2 API introduced in the L is a lot more complex: more than ten classes are involved, calls (almost always) are asynchronized, plus lots of capture controls and meta data that you feel confused about.