Typically Lennart Poettering gives his conference talks about various aspects
of the
systemd init replacement, and his presentation at LinuxCon Europe
was in the same vein. But, instead of the core functionality of systemd, he
spoke about a mostly unknown utility that ships with it: systemd-nspawn.
The tool started as a debugging aid for systemd development, but has many
more uses than just that, he said. In fact, systemd-nspawn is like the
chroot command—but it is a "chroot on steroids"
according to the title of his talk.
Poettering began by noting that
most people think of systemd as an init system, which it is, but that's
just where it started and it is more than that now. Systemd is a set of
"components needed to build up an operating system on top of the Linux
kernel", he said. As part of the development of systemd, the team
looked at various kernel features to see if they were relevant to the project.
One of
the features considered was containers. Containers on Linux usually
means either using LXC or libvirt LXC, he said.
Those two are, he
stressed, totally separate projects despite the name similarity. Both are
quite different from the well-known (and understood) chroot
command (and underlying system call). There is no configuration required
for chroot, unlike the other two. The systemd project needed a
way to run inside of containers or virtual machines, but wanted a simple
tool that was more like chroot than either LXC or libvirt LXC.
Enter systemd-nspawn.
The idea was to write a tool that does much of what LXC and libvirt LXC do,
but is
easier to use. It is targeted at "building, testing, debugging, and
profiling", not at deployment. systemd-nspawn uses the same
kernel APIs that the other
two tools use, but is not a competitor to them because it is not targeted
at running in a production environment.
Like chroot, systemd-nspawn "just works" with "no
configuration". The latter is not quite true, Poettering said, but the
configuration has been deliberately kept simple. As an example, he showed
the yum command needed to create a minimal Fedora 19 installation
in a directory (similar commands for multiple distributions are available
in the man
page). That became the basis for his subsequent demos.
After setting up a distribution directory, one can boot a container with a
simple command (as root):
systemd-nspawn -bD dir
The
-D specifies the root directory for the container and
-b says to boot it using
systemd inside the container.
Omitting
-b is similar to booting a kernel with the
init=/bin/bash command-line parameter, which results in a root
shell.
While he called it "booting" a container, there is no actual kernel boot
that occurs as all of the containers are running under the host kernel.
Poettering then showed that starting the container goes through the
normal startup sequence for the distribution by starting various services
inside the container and so on. When complete, you get a login prompt.
Logging in as "root" with no password enters the container, which,
unsurprisingly, looks like a Fedora 19 installation. It is a
"full container", Poettering said; additional software can be installed
inside it using yum, for example. He showed a ps command
both inside and outside the container to show that the processes were
running on the system (of course) but that they had different PIDs inside
and outside.
The container will automatically get its network configuration and time
from the host, but set its hostname based on the directory name (or
-M name). It also bind mounts /etc/resolv.conf from
the host so that name resolution works inside the container. As one might
expect, when finished with the container you can poweroff to shut
it down or use reboot to restart it.
Poettering then moved on to some tools that make it easier to work with the
"nspawn containers" as well as some work that the team has done to make
standard tools report things like container names. For example,
cgls (which was systemd-cgls in earlier systemd releases)
shows control groups and their processes in a tree-like structure similar
to that of pstree. Also, systemd-cgtop shows control
groups in at top-like display, sorting them based on which are
using the most CPU time.
Another addition is the machinectl
command that manages "machines" (either containers or virtual machines) for
systemd. When nspawn creates a new container, it registers that machine
with systemd over D-Bus. Those machines can then be monitored and managed
using machinectl. For example:
machinectl status mname
That will show status of the machine called
mname. That name is
also integrated with tools like
ps so that one can specify
machine as an output column to see which container a process is
running in. The machine name registration is also done by libvirt LXC, so
those containers are treated similarly; so far, though, LXC is not using
the facility. One of the goals is to eventually allow
systemctl
(the systemd management program) to take a machine-name argument and
have it operate on the instance inside the machine.
The integration with machine names means that systemd-nspawn does
require a system that has been booted by systemd in order to function.
Earlier versions of systemd shipped with an independent nspawn, but that
has fallen by the wayside.
Centralizing the system log information for the nspawn containers, while
still allowing getting that information on a per-container basis, is
handled by integration with the Journal.
Using the -j option to nspawn will link the container's journal
with that of the host. The Journal is "a little like syslog except that it
is indexed", Poettering said. With linked journals, the system logs for
multiple containers can be monitored or queried from the host.
Another feature of nspawn is that it can isolate the container from the host
network. As mentioned earlier, by default nspawn inherits the network of
the host, but the --private-network argument will create a
container without any network devices other than loopback. That is "ideal
for build systems", Poettering said, which shouldn't need the network after
the initial package retrieval.
Nspawn is quite useful in a number of scenarios and the systemd team has
used it extensively to debug systemd itself, he said. Normally, an init
system is difficult to debug, but when you can use gdb,
strace, and similar tools from the host to the programs running
in a container, it makes it much easier. It is a tool that more people,
especially in the "DevOps" community, should be aware of, he said—his talk,
and articles like this, will hopefully start getting that word out.
[I would like to thank the Linux Foundation for travel assistance to
Edinburgh to attend LinuxCon Europe.]
(
Log in to post comments)