Creating containers with systemd-nspawn [LWN.net]

By Jake Edge
November 7, 2013

Typically Lennart Poettering gives his conference talks about various aspects of the systemd init replacement, and his presentation at LinuxCon Europe was in the same vein. But, instead of the core functionality of systemd, he spoke about a mostly unknown utility that ships with it: systemd-nspawn. The tool started as a debugging aid for systemd development, but has many more uses than just that, he said. In fact, systemd-nspawn is like the chroot command—but it is a "chroot on steroids" according to the title of his talk.

Poettering began by noting that most people think of systemd as an init system, which it is, but that's just where it started and it is more than that now. Systemd is a set of "components needed to build up an operating system on top of the Linux kernel", he said. As part of the development of systemd, the team looked at various kernel features to see if they were relevant to the project.

One of the features considered was containers. Containers on Linux usually means either using LXC or libvirt LXC, he said. Those two are, he stressed, totally separate projects despite the name similarity. Both are quite different from the well-known (and understood) chroot command (and underlying system call). There is no configuration required for chroot, unlike the other two. The systemd project needed a way to run inside of containers or virtual machines, but wanted a simple tool that was more like chroot than either LXC or libvirt LXC. Enter systemd-nspawn.

The idea was to write a tool that does much of what LXC and libvirt LXC do, but is easier to use. It is targeted at "building, testing, debugging, and profiling", not at deployment. systemd-nspawn uses the same kernel APIs that the other two tools use, but is not a competitor to them because it is not targeted at running in a production environment.

Like chroot, systemd-nspawn "just works" with "no configuration". The latter is not quite true, Poettering said, but the configuration has been deliberately kept simple. As an example, he showed the yum command needed to create a minimal Fedora 19 installation in a directory (similar commands for multiple distributions are available in the man page). That became the basis for his subsequent demos.

After setting up a distribution directory, one can boot a container with a simple command (as root):

    systemd-nspawn -bD dir

The -D specifies the root directory for the container and -b says to boot it using systemd inside the container. Omitting -b is similar to booting a kernel with the init=/bin/bash command-line parameter, which results in a root shell. While he called it "booting" a container, there is no actual kernel boot that occurs as all of the containers are running under the host kernel. Poettering then showed that starting the container goes through the normal startup sequence for the distribution by starting various services inside the container and so on. When complete, you get a login prompt.

Logging in as "root" with no password enters the container, which, unsurprisingly, looks like a Fedora 19 installation. It is a "full container", Poettering said; additional software can be installed inside it using yum, for example. He showed a ps command both inside and outside the container to show that the processes were running on the system (of course) but that they had different PIDs inside and outside.

The container will automatically get its network configuration and time from the host, but set its hostname based on the directory name (or -M name). It also bind mounts /etc/resolv.conf from the host so that name resolution works inside the container. As one might expect, when finished with the container you can poweroff to shut it down or use reboot to restart it.

Poettering then moved on to some tools that make it easier to work with the "nspawn containers" as well as some work that the team has done to make standard tools report things like container names. For example, cgls (which was systemd-cgls in earlier systemd releases) shows control groups and their processes in a tree-like structure similar to that of pstree. Also, systemd-cgtop shows control groups in at top-like display, sorting them based on which are using the most CPU time.

Another addition is the machinectl command that manages "machines" (either containers or virtual machines) for systemd. When nspawn creates a new container, it registers that machine with systemd over D-Bus. Those machines can then be monitored and managed using machinectl. For example:

    machinectl status mname

That will show status of the machine called mname. That name is also integrated with tools like ps so that one can specify machine as an output column to see which container a process is running in. The machine name registration is also done by libvirt LXC, so those containers are treated similarly; so far, though, LXC is not using the facility. One of the goals is to eventually allow systemctl (the systemd management program) to take a machine-name argument and have it operate on the instance inside the machine.

The integration with machine names means that systemd-nspawn does require a system that has been booted by systemd in order to function. Earlier versions of systemd shipped with an independent nspawn, but that has fallen by the wayside.

Centralizing the system log information for the nspawn containers, while still allowing getting that information on a per-container basis, is handled by integration with the Journal. Using the -j option to nspawn will link the container's journal with that of the host. The Journal is "a little like syslog except that it is indexed", Poettering said. With linked journals, the system logs for multiple containers can be monitored or queried from the host.

Another feature of nspawn is that it can isolate the container from the host network. As mentioned earlier, by default nspawn inherits the network of the host, but the --private-network argument will create a container without any network devices other than loopback. That is "ideal for build systems", Poettering said, which shouldn't need the network after the initial package retrieval.

Nspawn is quite useful in a number of scenarios and the systemd team has used it extensively to debug systemd itself, he said. Normally, an init system is difficult to debug, but when you can use gdb, strace, and similar tools from the host to the programs running in a container, it makes it much easier. It is a tool that more people, especially in the "DevOps" community, should be aware of, he said—his talk, and articles like this, will hopefully start getting that word out.

[I would like to thank the Linux Foundation for travel assistance to Edinburgh to attend LinuxCon Europe.]

(Log in to post comments)