2025年11月07日
Over on the Fediverse, I mentioned a grump I have about containers:
As a sysadmin, containers irritate me because they amount to abandoning the idea of well done, well organized, well understood, etc installation of software. Can't make your software install in a sensible way that people can control and limit? Throw it into a container, who cares what it sprays where across the filesystem and how much it wants to be the exclusive owner and controller of everything in sight.
(This is a somewhat irrational grump.)
To be specific, it's by and large abandoning the idea of well done installs of software on shared servers. If you're only installing software inside a container, your software can spray itself all over the (container) filesystem, put itself in hard-coded paths wherever it feels like, and so on, even if you have completely automated instructions for how to get it to do that inside a container image that's being built. Some software doesn't do this and is well mannered when installed outside a container, but some software does and you'll find notes to the effect that the only supported way of installing it is 'here is this container image', or 'here is the automated instructions for building a container image'.
To be fair to containers, some of this is due to missing Unix APIs (or APIs that theoretically exist but aren't standardized). Do you want multiple Unix logins for your software so that it can isolate different pieces of itself? There's no automated way to do that. Do you run on specific ports? There's generally no machine-readable way to advertise that, and people may want you to build in mechanisms to vary those ports and then specify the new ports to other pieces of your software (that would all be bundled into a container image). And so on. A container allows you to put yourself in an isolated space of Unix UIDs, network ports, and so on, one where you won't conflict with anyone else and won't have to try to get the people who want to use your software to create and manage the various details (because you've supplied either a pre-built image or reliable image building instructions).
But I don't have to be happy that software doesn't necessarily even try, that we seem to be increasingly abandoning much of the idea of running services in shared environments. Shared environments are convenient. A shared Unix environment gives you a lot of power and avoids a lot of complexity that containers create. Fortunately there's still plenty of software that is willing to be installed on shared systems.
(Then there is the related grump that the modern Linux software distribution model seems to be moving toward container-like things, which has a whole collection of issues associated with it.)
2025年11月05日
curlFor various reasons, I'm working to switch from wget to curl, and generally this has been going okay. However, I've now run into one situation where I don't know how to make curl do what I want. It is, of course, a project that doesn't bother to do easily-fetched downloads, but in a very specific way. In fact it's Django (again).
The Django URLs for downloads look like this:
https://www.djangoproject.com/download/5.2.8/tarball/
The way the websites of many projects turn these into actual files is to provide a filename in the HTTP Content-Disposition header in the reply. In curl, these websites can be handled with the -J (--remote-header-name) option, which uses the filename from the Content-Disposition if there is one.
Unfortunately, Django's current website does not operate this way. Instead, the URL above is a HTTP redirection to the actual .tar.gz file (on media.djangoproject.com). The .tar.gz file is then served without a Content-Disposition header as an application/octet-stream. Wget will handle this with --trust-server-names, but as far as I can tell from searching through the curl manpage, there is no option that will do this in curl.
(In optimistic hope I even tried --location-trusted, but no luck.)
If curl is directed straight to the final URL, 'curl -O' alone is
enough to get the right file name. However, if curl goes through a
redirection, there seems to be no option that will cause it to
re-evaluate the 'remote name' based on the new URL; the initial URL
and the name derived from it sticks, and you get a file unhelpfully
called 'tarball' (in this case). If you try to be clever by running
the initial curl without -O but capturing any potential redirection
with "-w '%{redirect_url}\n'" so you can manually follow it in
a second curl command, this works (for one level of redirections)
but leaves you with a zero-length file called 'tarball' from the
first curl.
It's possible that this means curl is the wrong tool for the kind of file downloads I want to do from websites like this, and I should get something else entirely. However, that something else should at least be a completely self contained binary so that I can easily drag it around to all of the assorted systems where I need to do this.
(I could always try to write my own in Go, or even take this as an opportunity to learn Rust, but that way lies madness and a lot of exciting discoveries about HTTP downloads in the wild. The more likely answer is that I hold my nose and keep using wget for this specific case.)
PS: I think it's possible to write a complex script using curl that more or less works here, but one of the costs is that you have to make first a HEAD and then a GET request to the final target, and that irritates me.
2025年11月02日
I have an unusual X desktop environment that has evolved over a long period, and as part of that I have an equally unusual and slowly evolved set of ways to handle URLs. By 'handle URLs', what I mean is going from an URL somewhere (email, text in a terminal, etc) to having the URL open in one of my several browser environments. Tied into this is handling non-URL things that I also want to open in a browser, for example searching for various sorts of things in various web places.
The simplest place to start is at the end. I have several browser environments and to go along with them I have a script for each that opens URLs provided as command line arguments in a new window of that browser. If there's no command line arguments, the scripts open a default page (usually a blank page, but for my main browser it's a special start page of links). For most browsers this works by running 'firefox <whatever>' and so will start the browser if it's not already running, but for my main browser I use a lightweight program that uses Firefox's X-based remote control protocol. which means I have to start the browser outside of it.
Layered on top of these browser specific scripts is a general script
to open URLs that I call 'openurl'. The purpose of openurl is to
pick a browser environment based on the particular site I'm going
to. For example, if I'm opening the URL of a site where I know I
need JavaScript, the script opens the URL in my special 'just make
it work' JavaScript enabled Firefox. Most urls open in my normal,
locked down Firefox. I configure programs like Thunderbird to open
URLs through this openurl script, sometimes directly and sometimes
indirectly.
(I haven't tried to hook openurl into the complex mechanisms
that xdg-open uses to decide how to open URLs. Probably I should but the whole
xdg-open thing irritates me.)
Layered on top of openurl and the specific browser scripts is a
collection of scripts that read the X selection and do a collection
of URL-related things with it. One script reads the X selection,
looks for it being a URL, and either feeds the URL to openurl or
just runs openurl to open my start page. Other scripts feed the
URL to alternate browser environments or do an Internet search for
the selection. Then I have a fvwm menu with
all of these scripts in it and one of my fvwm mouse button bindings brings up this menu. This lets me select a
URL in a terminal window, bring up the menu, and open it in either
the default browser choice or a specific browser choice.
(I also have a menu entry for 'open the selection in my main browser' in one of my main fvwm menus, the one attached to the middle mouse button, which makes it basically reflexive to open a new browser window or open some URL in my normal browser.)
The other way I handle URLs is through dmenu. One of the things my dmenu environment does is recognize URLs and open them in my default browser environment. I also have short dmenu commands to open URLs in my other browser environments, or open URLs based on the parameters I pass the command (such as a 'pd' script that opens Python documentation for a standard library module). Dmenu itself can paste in the current X selection with a keystroke, which makes it convenient to move URLs around. Dmenu is also how I typically open a URL if I'm typing it in instead of copying it from the X selection, rather than opening a new browser window, focusing the URL bar, and entering the URL there.
(I have dmenu set up to also recognize 'about:*' as URLs and have various Firefox about: things pre-configured as hidden completions in dmenu, along with some commonly used website URLs.)
As mentioned, dmenu specifically opens plain URLs in my default
browser environment rather than going through openurl. I may
change this someday but in practice there aren't enough special
sites that it's an issue. Also, I've made dedicated little
dmenu-specific scripts that open up the
various sites I care about in the appropriate browser, so I can
type 'mastodon' in dmenu to open up my Fediverse account in the JavaScript-enabled Firefox
instance.
2025年10月26日
Suppose, not hypothetically, that you have a very small DNS server for a captive network situation, where the DNS server exists only to give clients answers for a small set of hosts. One of the ways you can implement this is with an authoritative DNS servers, such as NSD, that simply has an extremely minimal set of DNS data. If you're using NSD for this, you might be curious how minimal you can be and how much you need to mimic ordinary DNS structure.
Here, by 'mimic ordinary DNS structure', I mean inserting various levels of NS records so there is a more or less conventional path of NS delegations from the DNS root ('.') down to your name. If you're providing DNS clients with 'dog.example.org', you might conventionally have a NS record for '.', a NS record for 'org.', and a NS record for 'example.org.', mimicking what you'd see in global DNS. Of course all of your NS records are going to point to your little DNS server, but they're present if anything looks.
Perhaps unsurprisingly, NSD doesn't require this and DNS clients normally don't either. If you say:
zone: name: example.org zonefile: example-stub
and don't have any other DNS data, NSD won't object and it will answer queries for 'dog.example.org' with your minimal stub data. This works for any zone, including completely made up ones:
zone: name: beyond.internal zonefile: beyond-stub
The actual NSD stub zone files can be quite minimal. An older OpenBSD
NSD appears to be happy with zone files that have only a $ORIGIN,
a $TTL, a '@ IN SOA' record, and what records you care about in
the zone.
Once I thought about it, I realized I should have expected this. An authoritative DNS server normally only holds data for a small subset of zones and it has to be willing to answer queries about the data it holds. Some authoritative DNS servers (such as Bind) can also be used as resolving name servers so they'd sort of like to have information about at least the root nameservers, but NSD is a pure authoritative server so there's no reason for it to care.
As for clients, they don't normally do DNS resolution starting from the root downward. Instead, they expect to operate by sending the entire query to whatever their configured DNS resolver is, which is going to be your little NSD setup. In a number of configurations, clients either can't talk directly to outside DNS or shouldn't try to do DNS resolution that way because it won't work; they need to send everything to their configured DNS resolver so it can do, for example, "split horizon" DNS.
(Yes, the modern vogue for DNS over HTTPS puts a monkey wrench into split horizon DNS setups. That's DoH's problem, not ours.)
Since this works for a .net zone, you can use it to try to disable DNS over HTTPS resolvers in your stub DNS environment by providing a .net zone with 'use-application-dns CNAME .' or the like, to trigger at least Firefox's canary domain detection.
(I'm not going to address whether you should have such a minimal stub DNS environment or instead count on your firewall to block traffic and have a normal DNS environment, possibly with split horizon or response policy zones to introduce your special names.)
2025年10月22日
In a comment on my entry on how we reboot our machines right after updating their kernels, Jukka asked a good question:
While I do not know how many machines there are in your fleet, I wonder whether you do incremental rolling, using a small snapshot for verification before rolling out to the whole fleet?
We do this to some extent but we can't really do it very much. The core problem is that the state of almost all of our machines is directly visible and exposed to people. This is because we mostly operate an old fashioned Unix login server environment, where people specifically use particular servers (either directly by logging in to them or implicitly because their home directory is on a particular NFS fileserver). About the only genuinely generic machines we have are the nodes in our SLURM cluster, where we can take specific unused nodes out of service temporarily without anyone noticing.
(Some of these login servers in use all of the time; others we might find idle if we're extremely lucky. But it's hard to predict when someone will show up to try to use a currently empty server.)
This means that progressively rolling out a kernel update (and rebooting things) to our important, visible core servers requires multiple people-visible reboots of machines, instead of one big downtime when everything is rebooted. Generally we feel that repeated disruptions are much more annoying and disruptive overall to people; it's better to get the pain of reboot disruptions over all at once. It's also much easier to explain to people, and we don't have to annoy them with repeated notifications that yet another subset of our servers and services will be down for a bit.
(To make an incremental deployment more painful for us, these will normally have to be after-hours downtimes, which means that we'll be repeatedly staying late, perhaps once a week for three or four weeks as we progressively work through a rollout.)
In addition to the nodes of our SLURM cluster, there are a number of servers that can be rebooted in the background to some degree without people noticing much. We will often try the kernel update out on a few of them in advance, and then update others of them earlier in the day (or the day before) both as a final check and to reduce the number of systems we have to cover at the actual out of hours downtime. But a lot of our servers cannot really be tested much in advance, such as our fileservers or our web server (which is under constant load for reasons outside the scope of this entry). We can (and do) update a test fileserver or a test web server, but neither will see a production load and it's under production loads that problems are most likely to surface.
This is a specific example of how the 'cattle' model doesn't fit all situations. To have a transparent rolling update that involves reboots (or anything else that's disruptive on a single machine), you need to be able to transparently move people off of machines and then back on to them. This is hard to get in any environment where people have long term usage of specific machines, where they have login sessions and running compute jobs and so on, and where you have have non-redundant resources on a single machine (such as NFS fileservers without transparent failover from server to server).
2025年10月20日
About four years ago I wrote an entry about how your SMART drive database of attribute meanings needs regular updates. That entry was written on the occasion of updating the database we use locally on our Ubuntu servers, and at the time we were using a mix of Ubuntu 18.04 and Ubuntu 20.04 servers, both of which had older drive databases that probably dated from early 2018 and early 2020 respectively. It is now late 2025 and we use a mix of Ubuntu 24.04 and 22.04 servers, both of which have drive databases that are from after October of 2021.
Experienced system administrators know where this one is going: today I updated our SMART drive database again, to a version of the SMART database that was more recent than the one shipped with 24.04 instead of older than it.
It's a fact of life that people forget things. People especially forget things that are a long way away, even if they make little notes in their worklog message when recording something that they did (as I did four years ago). It's definitely useful to plan ahead in your documentation and write these notes, but without an external thing to push you or something to explicitly remind you, there's no guarantee that you'll remember.
All of which leads me to the view that it would be useful for us
to have a long range calendar reminder system, something that could
be used to set reminders for more than a year into the future and
ideally allow us to write significant email messages to our future
selves to cover all of the details (although there are hacks around
that, such as putting the details on a web page and having the
calendar mail us a link). Right now the best calendar reminder
system we have is the venerable calendar, which
we can arrange to email one-line notes to our general address that
reaches all sysadmins, but calendar doesn't let you include the
year in the reminder date.
(For SMART drive database updates, we could get away with mailing ourselves once a year in, say, mid-June. It doesn't hurt to update the drive database more than every Ubuntu LTS release. But there are situations where a reminder several years in the future is what we want.)
PS: Of course it's not particularly difficult to build an ad-hoc script system to do this, with various levels of features. But every local ad-hoc script that we write is another little bit of overhead, and I'd like to avoid that kind of thing if at all possible in favour of a standard solution (that isn't a shared cloud provider calendar).
2025年10月13日
A commentator on my entry on systemd-resolved's new DNS server delegation feature asked:
My memory might fail me here, but: wasn't something like this a feature introduced in ISC's BIND 8, and then considered to be a bad mistake and dropped again in BIND 9 ?
I don't know about Bind, but what I do know is that this feature is present in other DNS resolvers (such as Unbound) and that it has a variety of uses. Some of those uses can be substituted with other features and some can't be, at least not as-is.
The quick version of 'DNS server delegation' is that you can send all queries under some DNS zone name off to some DNS server (or servers) of your choice, rather than have DNS resolution follow any standard NS delegation chain that may or may not exist in global DNS. In Unbound, this is done through, for example, Forward Zones.
DNS server delegation has at least three uses that I know of. First, you can use it to insert entire internal TLD zones into the view that clients have. People use various top level names for these zones, such as .internal, .kvm, .sandbox (our choice), and so on. In all cases you have some authoritative servers for these zones and you need to direct queries to these servers instead of having your queries go to the root nameservers and be rejected.
(Obviously you will be sad if IANA ever assigns your internal TLD to something, but honestly if IANA allows, say, '.internal', we'll have good reason to question their sanity. The usual 'standard DNS environment' replacement for this is to move your internal TLD to be under your organizational domain and then implement split horizon DNS.)
Second, you can use it to splice in internal zones that don't exist in external DNS without going to the full overkill of split horizon authoritative data. If all of your machines live in 'corp.example.org' and you don't expose this to the outside world, you can have your public example.org servers with your public data and your corp.example.org authoritative servers, and you splice in what is effectively a fake set of NS records through DNS server delegation. Related to this, if you want you can override public DNS simply by having an internal and an external DNS server, without split horizon DNS; you use DNS server delegation to point to the internal DNS server for certain zones.
(This can be replaced with split horizon DNS, although maintaining split horizon DNS is its own set of headaches.)
Finally, you can use this to short-cut global DNS resolution for reliability in cases where you might lose external connectivity. For example, there are within-university ('on-campus' in our jargon) authoritative DNS servers for .utoronto.ca and .toronto.edu. We can use DNS server delegation to point these zones at these servers to be sure we can resolve university names even if the university's external Internet connection goes down. We can similarly point our own sub-zone at our authoritative servers, so even if our link to the university backbone goes down we can resolve our own names.
(This isn't how we actually implement this; we have a more complex split horizon DNS setup that causes our resolving DNS servers to have a complete copy of the inside view of our zones, acting as caching secondaries.)
2025年10月10日
Yesterday I wrote about restarting or redoing something after a systemd service restarts. The non-hypothetical situation that caused me to look into this was that after we applied a package update to one system, systemd-networkd on it restarted and wiped out some critical policy based routing rules. Since I vaguely remembered this happening before, I sighed and arranged to have our rules automatically reapplied on both systems with policy based routing rules, following the pattern I worked out.
Wait, two systems? And one of them didn't seem to have problems after the systemd-networkd restart? Yesterday I ignored that and forged ahead, but really it should have set off alarm bells. The reason the other system wasn't affected was I'd already solved the problem the right way back in March of 2024, when we first hit this networkd behavior and I wrote an entry about it.
However, I hadn't left myself (or my co-workers) any notes about that March 2024 fix; I'd put it into place on the first machine (then the only machine we had that did policy based routing) and forgotten about it. My only theory is that I wanted to wait and be sure it actually fixed the problem before documenting it as 'the fix', but if so, I made a mistake by not leaving myself any notes that I had a fix in testing. When I recently built the second machine with policy based routing I copied things from the first machine, but I didn't copy the true networkd fix because I'd forgotten about it.
(It turns out to have been really useful that I wrote that March 2024 entry because it's the only documentation I have, and I'd probably have missed the real fix if not for it. I rediscovered it in the process of writing yesterday's entry.)
I know (and knew) that keeping notes is good, and that my memory is fallible. And I still let this slip through the cracks for whatever reason. Hopefully the valuable lesson I've learned from this will stick a bit so I don't stub my toe again.
(One obvious lesson is that I should make a note to myself any time I'm testing something that I'm not sure will actually work. Since it may not work I may want to formally document it in our normal system for this, but a personal note will keep me from completely losing track of it. You can see the persistence of things 'in testing' as another example of the aphorism that there's nothing as permanent as a temporary fix.)
2025年10月03日
Every so often on the Fediverse, people ask for advice on a monitoring system to run on their machine (desktop or server), and some of the time Prometheus, and when it does I wind up making awkward noises. On the one hand, we run Prometheus (and Grafana) and are happy with it, and I run separate Prometheus setups on my work and home desktops. On the other hand, I don't feel I can recommend picking Prometheus for a basic single-machine setup, despite running it that way myself.
Why do I run Prometheus on my own machines if I don't recommend that you do so? I run it because I already know Prometheus (and Grafana), and in fact my desktops (re)use much of our production Prometheus setup (but they scrape different things). This is a specific instance (and example) of a general thing in system administration, which is that not infrequently it's simpler for you to use something you already know even if it's not necessarily an exact fit (or even a great fit) for the problem. For example, if you're quite familiar with operating PostgreSQL databases, it might be simpler to use PostgreSQL for a new system where SQLite could do perfectly well and other people would find SQLite much simpler. Especially if you have canned setups, canned automation, and so on all ready to go for PostgreSQL, and not for SQLite.
(Similarly, our generic web server hammer is Apache, even if we're doing things that don't necessarily need Apache and could be done perfectly well or perhaps better with nginx, Caddy, or whatever.)
This has a flipside, where you use a tool because you know it even if there might be a significantly better option, one that would actually be easier overall even accounting for needing to learn the new option and build up the environment around it. What we could call "familiarity-driven design" is a thing, and it can even be a confining thing, one where you shape your problems to conform to the tools you already know.
(And you may not have chosen your tools with deep care and instead drifted into them.)
I don't think there's any magic way to know which side of the line you're on. Perhaps the best we can do is be a little bit skeptical about our reflexive choices, especially if we seem to be sort of forcing them in a situation that feels like it should have a simpler or better option (such as basic monitoring of a single machine).
(In a way it helps that I know so much about Prometheus because it makes me aware of various warts, even if I'm used to them and I've climbed the learning curves.)
2025年09月30日
Once upon a time, my email handling was relatively
simple. I wasn't on any big mailing lists, so I had almost everything
delivered straight to my inbox (both in the traditional /var/mail
mbox sense and then through to
MH's
own inbox folder directory). I did some mail filtering with procmail,
but it was all for things that I basically never looked at, so I
had procmail write them to mbox
files under $HOME/.mail. I moved email from my Unix /var/mail
inbox to MH's inbox with MH's inc command (either running
it directly or having exmh run it for
me). Rarely, I had a mbox
file procmail had written that I wanted to read, and at that point
I inc'd it either to my MH +inbox or to some other folder.
Later, prompted by wanting to improve my breaks and vacations, I diverted a bunch of mailing lists away
from my inbox. Originally I had
procmail write these diverted messages to mbox files, then later
I'd inc the files to read the messages. Then I found that outside
of vacations, I needed to make this email more readily accessible, so I had procmail put them in MH
folder directories under Mail/inbox (one of MH's nice features is
that your inbox is a regular folder and can have sub-folders, just
like everything else). As I noted at the time, procmail only partially emulates
MH when doing this, and one of the things it doesn't do is keep
track of new, unread ('unseen') messages.
(MH has a general purpose system for keeping track of 'sequences' of messages in a MH folder, so it tracks unread messages based on what is in the special 'unseen' sequence. Inc and other MH commands update this sequence; procmail doesn't.)
Along with this procmail setup I wrote a basic script, called
mlists, to report how many messages each of these 'mailing list'
inboxes had in them. After a while I started diverting lower priority
status emails and so on through this system (and stopped reading
the mailing lists); if I got a type of
email in any volume that I didn't want to read right away during
work, it probably got shunted to these side inboxes. At some point
I made mlists optionally run the MH scan command to show me
what was in each inbox folder (well, for the inbox folders where
this was potentially useful information). The mlists script was
still mostly simple and the whole system still made sense, but it
was a bit more complex than before, especially when it also got a
feature where it auto-reset the current message number in each
folder to the first message.
A couple of years ago, I switched the MH frontend I used from
exmh to MH-E in
GNU Emacs, which changed how I read my email in practice. One of the changes was that I started
using the GNU Emacs Speedbar,
which always displays a count of messages in MH folders and especially
wants to let you know about folders with unread messages. Since I
had the hammer of my mlists script handy, I proceeded to mutate
it to be what a comment in the script describes as "a discount
maintainer of 'unseen'", so that MH-E's speedbar could draw my
attention to inbox folders that had new messages.
This is not the right way to do this. The right way to do this is
to have procmail deliver messages through MH's rcvstore, which as a MH
command can update the 'unseen' sequence properly. But using rcvstore
is annoying, partly because you have to use another program to add
the locking it needs, so at every point the path of least resistance
was to add a bit more hacks to what I already had. I had procmail,
and procmail could deliver to MH folder directories, so I used it
(and at the time the limitations were something I considered a
feature). I had a script to give me basic information, so it could
give me more information, and then it could do one useful thing
while it was giving me information, and then the one useful thing
grew into updating 'unseen'.
And since I have all of this, it's not even worth the effort of
switching to the proper rcvstore approach and throwing a
bunch of it away. I'm always going to want the 'tell me stuff'
functionality of my mlists script, so part of it has to stay
anyway.
Can I see similarities between this and how various of our system
tools have evolved, mutated, and become increasingly complex? Of
course. I think it's much the same obvious forces involved, because
each step seems reasonable in isolation, right up until I've built
a discount environment that duplicates much of rcvstore.
It turns out that part of the time, I want to get some degree of live notification of messages being filed into these inbox folders. I may not look at all or even many of them, but there are some periodic things that I do want to pay attention to. So my discount special hack is basically:
tail -f .mail/procmail-log | egrep -B2 --no-group-separator 'Folder: /u/cks/Mail/inbox/'
(This is a script, of course, and I run it in a terminal window.)
This could be improved in various ways but then I'd be sliding down the convoluted complexity slope and I'm not willing to do that. Yet. Give it a few years and I may be back to write an update.
These are my WanderingThoughts
(About the blog)
Full index of entries
Recent comments
This is part of CSpace, and is written by ChrisSiebenmann.
Mastodon: @cks
(削除) Twitter (削除ここまで) @thatcks
* * *
Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web
Also: (Sub)topics
This is a DWiki.
GettingAround
(Help)