That sure was a bunch of debugging because I forgot that my systemd .service file that runs scripts needed
Type=oneshot
RemainAfterExit=True(... or it'd apparently run the ExecStop script right after the ExecStart script, which doesn't work too well.)
Let's be specific here. This was the systemd .service unit to
bring up my WireGuard tunnel on my work
machine, which I set up to run a 'startup' script (via ExecStart=).
Because I had a 'stop' script sitting around, I also set the unit's
ExecStop= to point to that; the 'stop' script takes the device
down and so on.
The startup script worked when I ran it by hand, but when I set up
the .service unit to start WireGuard on boot, it didn't. Specifically,
although journalctl reported no errors, the WireGuard tunnel
network device and its associated routes just weren't there when
the system finished booting. At first I thought the script was
failing in a way that the systemd journal wasn't capturing, so I
stuck a bunch of debugging in (capturing all output from the script
in a file, and then running with 'set -x', and finally dumping
out various pieces of network state after the script had finished).
All of this debugging convinced me that the WireGuard tunnel was
being created during boot but then getting destroyed by the time
booting finished. I flailed around for a while theorizing that this
service or that service was destroying the WireGuard device when
it was starting (and altering my .service to start after a steadily
increasing number of other things), but nothing fixed the issue.
Then, while I was starting at my .service file, the penny dropped
and I actually read what was in front of my eyes:
[Service] WorkingDirectory=/var/local/wireguard ExecStart=/var/local/wireguard/startup ExecStop=/var/local/wireguard/stop Environment=LANG=C
This .service file had started out life as one that I'd copied
from another .service file of mine. However, that .service file
was for a daemon, where the ExecStart= was a process that was
sticking around. I was running a script, and the script was exiting,
which meant that as far as systemd was concerned the service was
going down and it should immediately run the ExecStop script. My
'stop' script deleted the WireGuard tunnel network device, which
explained why I found the device missing after booting had finished.
The journalctl output won't tell you this; it reports only that
the service started and not mention that it's stopped again and
that the ExecStop script was run. If I'd looked at 'systemctl
status ...' and paid attention, I'd at least have had a clue because
systemd would have told me that it thought that the service was
'inactive (dead)' instead of running. If I'd had both scripts
explicitly log that they were running, I would have seen in the
logs that my 'stop' script was being executed for some reason; I
probably should add this.
This has been a pretty useful learning experience. I know, that probably sounds weird, but my view is that I'd rather make these mistakes and learn these lessons in a non-urgent, non-production situation instead of stubbing my toes on them in production and possibly under stressful conditions.
These are my WanderingThoughts
(About the blog)
Full index of entries
Recent comments
This is part of CSpace, and is written by ChrisSiebenmann.
Mastodon: @cks
(削除) Twitter (削除ここまで) @thatcks
* * *
Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web
Also: (Sub)topics
This is a DWiki.
GettingAround
(Help)