Yesterday I announced a first public release of nitro, a tiny init system and process supervisor. This got a fair bit of attention, and to my delight was even boosted on the fediverse by both Laurent Bercot (of s6 fame) and djb himself.
One of the most requested things was a comparison to other init systems. Since I’m most familiar with runit, I shall compare nitro and runit here.
runit and nitro share the basic design of having a directory of services and using small scripts to spawn the processes.
Comparing nitro to runit, there are a few new features and some architectural differences.
From a design point of view, runit follows the daemontoools approach
of multiple small tools: The runit-init
process spawns runsvdir
, which
spawns a runsv
process for each service.
nitro favors a monolithic approach, and keeps everything in a single process. This makes it also easier to install for containerization.
The new features are:
nitro keeps all runtime state in RAM and provides an IPC interface
to query it, whereas runit emits state to disk. This enables nitro
to run on read-only file systems without special configuration.
(However, you need a tmpfs to store the socket file. In theory, on
Linux, you could even use /proc/1/fd/10
or an abstract Unix domain
socket, but that requires adding permission checks.)
support for one-shot "services", i.e. running scripts on up/down without a process to supervise (e.g. persist audio volume, keep RNG state). For runit, you can fake this with a pause process, which has a little more overhead.
parametrized services. One service directory can be run multiple
times, e.g. agetty@
can be spawned multiple times to provide
agetty processes for different terminals. This can be faked in
runit with symlinks, but nitro also allows fully dynamic creation of
service instances.
log chains. runit supports only one logger per service, and log services can’t have loggers on their own.
Currently, nitro also lacks some features:
service checks are not implemented (see below), a service that didn’t crash within 2 seconds is considered to be running currently.
runsvchdir
is not supported to change all services at once.
However, under certain conditions, you can change the contents of
/etc/nitro
completely and rescan
to pick them up. nitro opens
the directory /etc/nitro
once and just re-reads the contents on
demand. (Proper reopening will be added at some point when
posix_getdents
is more widespread. opendir
/readdir
/closedir
implies
dynamic memory allocation.)
You can’t override nitroctl
operations with scripts as for sv
.
nitro tracks service identity by name, not inode number of the service directory. This has benefits (parametrized services are possible) and drawbacks (you may need to restart more things explicitly if you fiddle with existing services, service lookup is a bit more work).
On the code side, nitro is written with modern(ish) POSIX.1-2008 systems in mind, whereas runit, being written in 2001 contains some quirks for obsolete Unix systems. It also uses a less familiar style of writing C code.
It depends: if the container just hosts a simple server, probably not. However, sometimes containers also need to run other processes to provide scheduled commands, caches, etc. which benefit from supervision.
Finally, PID 1 needs to reap zombies, and not all processes used as
PID 1 in containers do that. nitro is only half the size of dumb-init
,
and less than twice as big as tini
.
Both runit and nitro don’t support declaring dependencies between
services. However, services can
wait for other
services to be up (and nitro has a special state for that, so only
successfully started services are considered UP
.)
Personally, I don’t believe service dependencies are of much use. My experiences with sysvinit, OpenRC, and systemd show that they are hard to get right and can have funny sideeffects such as unnecessary restarts of other services when something crashed, or long delays until the system can be brought down.
For system bringup, it can be useful to sequence operations
(e.g. start udevd
very early, then bring up the network, then mount
things, etc.). nitro supports this by allowing the SYS/setup
script
to start and wait for services. Likewise, services can be shutdown in
defined order.
nitro is a generic tool, but many features provided by other supervisors can be implemented as site policies using separate tools. For example, nothing stops you from writing a thing to infer service dependencies and do a "better" bringup. However, this code doesn’t need to be part of nitro itself, nor run inside PID 1.
Likewise, things like liveness checks can be implemented as separate
tools. External programs can quite easily keep track of too many
restarts and trigger alerts. An simple Prometheus exporter is
included in contrib
.
At some point I want to add readiness checks, i.e. having an explicit
transition from STARTING
to UP
(as mentioned above, currently this
happens after 2 seconds).
Unfortunately, the existing mechanisms for service readiness
(e.g. systemd’s sd_notify
or s6
notification fd) are incompatible
to each other, and I don’t really like either. But I also don’t
really want to add yet another standard.
[This is mostly written down for future reference.]
I think my first exposure to daemontools-style supervision was back in
2005 when I had shared hosting at
Aria’s old company
theinternetco.net.
There was a migration from Apache to Lighttpd, which meant
.htaccess
files weren’t supported anymore. So I got my own Lighttpd
instance that was supervised by, if I remember correctly, freedt.
Later, I started the first musl-based Linux distribution sabotage and built busybox runit-based init scripts from scratch.
When Arch (which I used mostly back then) moved towards systemd, I wrote ignite, a set of runit scripts to boot Arch. (Fun fact: the last machine running ignite was decommissioned earlier this year.)
Finally, xtraeme discovered the project and invited me to help move Void to runit. En passant I became a Void maintainer.
Work on nitro started around 2020 with some experiments how a monolithic supervisor could look like. The current code base was started in 2023.
NP: EA80—Die Goldene Stadt
You may perhaps not recognize the name of Kevin S. Braunsdorf, or "ksb"
(kay ess bee) as he was called, but you certainly used one tool he wrote,
together with Matthew Bradburn, namely the implementation of test(1)
in GNU coreutils.
Kevin S. Braunsdorf died last year, on July 24, 2024, after a long illness.
In this post, I try to remember his work and legacy.
He studied at Purdue University and worked there as a sysadmin from 1986 to 1994. Later, he joined FedEx and greatly influenced how IT is run there, from software deployments to the physical design of datacenters.
Kevin was a pioneer of what we today call "configuration engineering",
and he wrote a Unix toolkit called msrc_base
to help with these tasks.
(Quote: "This lets a team of less than 10 people run more than 3,200
instances without breaking themselves or production.")
Together with other tools that are useful in general, he built the
"pundits tool-chain".
These tools deserve further investigation.
Now, back in those days, Unix systems were vastly heterogeneous and
ridden with vendor-specific quirks and bugs. His tooling centers
around a least common denominator; for example, m4
and make
are
used heavily as they were widely available (and later, Perl). C
programs have to be compiled on their specific target hosts. Remote
execution initially used rsh
, file distribution was done with
rdist
. Everything had to be bootstrappable from simple shell
scripts and standard Unix tools, porting to new platforms was common.
msrc
The basic concept of how
msrc
works was already implemented in the first releases from
2000
we can find online: at its core, there is a two-stage Makefile, where
one part runs on the distribution machine, and then the results get
transferred to the target machine (say, with rdist
), and then a
second Makefile (Makefile.host
) is run there.
This is a practical and very flexible approach. Configuration can be kept centralized, but if you need to run tasks on the target machine (say, compile software across your heterogeneous architecture), it is possible to do as well.
Over time, tools were added to parallelize this (xapply
), make the
deployment logs readable (xclate
), or work around resource
contention (ptbw
). Likewise, tools for inventory management and
host definitions were added (hxmd
, efmd
). Stateful operations on
sets (oue
) can be used for retrying on errors by keeping track of
failed tasks....
All tools are fairly well documented, but documentation is spread among many files, so it takes some time to understand the core ideas.
Start here if you are curious.
Unix systems contain a series of ad-hoc text formats, such as the
format of /etc/passwd
. ksb invented a tiny language to work with
such file formats, implemented by the dicer. A sequence of field
separators and field selectors can be used to drill down on formatted
data:
% grep Leah /etc/passwd
leah:x:1000:1000:Leah Neukirchen:/home/leah:/bin/zsh
% grep Leah /etc/passwd | xapply -f 'echo %[1:5 $] %[1:$/$]' -
Neukirchen zsh
The first field (the whole line) is split on :
, then we select the
5th field, split by space, then select the last field ($
).
For the basename of the shell, we split by /
.
Using another feature, the mixer, we can build bigger strings from diced results. For example to format a phone number:
% echo 5555551234 | xapply -f 'echo %Q(1,"(",1-3,") ",4-6,"-",7-$)' -
(555) 555-1234
The %Q
does shell-quoting here!
Since the dicer and the mixer are implemented as library routines, they appear in multiple tools.
One of the more controversial choices in the pundits tool-chain is that
"business logic" (e.g. things like "this server runs this OS and has
this purpose, therefore it should have this package installed") is
generally implemented using the notorious macro processor m4
. But
there were few other choices back then: awk
would have been a
possibility, but is a bit tricky to use due to its line-based
semantics. perl
wasn’t around when the tool-chain was started, though
it was used later for some things. But m4
shines if you want to
convert a text file into a text file with some pieces of logic.
One central tool is
hxmd
,
which takes tabular data file that contain configuration data (such
as, which hosts exist and what roles do they have), and can use m4
snippets to filter and compute custom command lines to deploy them,
e.g.:
% hxmd -C site.cf -E "COMPUTONS(CPU,NPROC)>1000" ...
Later, another tool named
efmd
was added that does not spawn a new m4
instance for each configuration line.
m4
is also used as a templating language. There I learned the nice
trick of quoting the entire document except for the parts where you
want to expand macros:
`# $Id...
# Output a minimal /etc/hosts to install to get the network going.
'HXMD_CACHE_TARGET`:
echo "# hxmd generated proto hosts file for 'HOST`"
echo "127.0.0.1 localhost 'HOST ifdef(`SHORTHOST',` SHORTHOST')`"
dig +short A 'HOST` |sed -n -e "s/^[0-9.:]*$$/& 'HOST ifdef(`SHORTHOST',` SHORTHOST')`/p"
'dnl
This example also shows that nested escaping was nothing ksb frowned upon.
Since many tools of the pundits tool-chain are meant to be used
together, they were written as so-called
"wrappers",
i.e. programs calling each other. For example, above mentioned hxmd
can spawn several commands in parallel using
xapply
,
which themselves call
xclate
again to yield different output streams, or use
ptbw
for resource management.
The great thing about the design of all these tools is how nicely they fit together. You can easily see what need drove the creation of the tool, and how they still can be used in a very general way, also for unanticipated use cases.
Discovering these tools was important for my own Unix toolkit and some tools are directly inspired, e.g. xe, and arr.
I still ponder host configuration systems.
NP: Adrianne Lenker—Not a Lot, Just Forever