…the AF_VSOCK "CID" (which is like an IP address, i.e. an identifier for the local VM) you can specify a friendly machine name, if the VM in question is registered with systemd-machined. systemd-vmspawn sets things up that way out of the box, of course. That means, with current off-the-shelf systemd inside a VM and on the host you can now just do "ssh machine/foobar" to connect to a local VM called "foobar", via AF_VSOCK, i.e. independently of any fragile network.
3️⃣7️⃣ Here's the 37th post highlighting key new features of the current v257 release of systemd. #systemd257
In systemd v256 we added a small tool "systemd-ssh-proxy" whose job is to allow connecting to local VMs with ssh via the AF_VSOCK protocol (as opposed to AF_INET/AF_INET6). It acts as host-side counterpart to the guest-side systemd-ssh-generator that automatically binds sshd to AF_VSOCK.
In systemd v257 the functionality has been updated so that instead of specifying…
This is extremely handy, since it "just works" here. In fact, I switched over to this for my private VM needs entirely now.
(In related news, systemd-ssh-proxy now supports the AF_VSOCK "MUX" protocol too. This means it's now compatible not only with AF_VSOCK how it's implemented by qemu, but also with the implementations in Firecracker/CloudHypervisor)
All in all I would say systems became a lot more debuggable out of the box this way, which is not just good for quickly tracking down issues in production environments, but I also see as a relevant in context of the open source philosophy: since the whole OS is typically open source, it also means coredumps are comprehensively useful, since you can always cross-link the stackframes to the sources: the pathway from execution to the sources behind it is now nicely paved.
…are in a pretty good position: we use libdw to generate the stack trace, running inside a local sandbox (since generating stacktraces means analyzing frequently corrupted, possibly differently privileged, complex data, which is hence security sensitive par excellence), and since the relevant distributions now ship minidebuginfo packages and are built with frame pointers enabled the default stacktraces you get this way are typically quite useful – without having to bother with gdb or so.
3️⃣2️⃣ Here's the 32nd post highlighting key new features of the current v257 release of systemd. #systemd257
One of the features we added on early to systemd was coredump processing. We wanted that crashing processes on one hand could be treated very much like any loggable event, and on the other hand be truly and immediately useful, i.e. the log messages generated should already carry a fully symbolized backtrace.
The path towards that goal was rocky, but today I think we…
…the container is just like handling on the host, and the processing of the dump data is done within the immediate sandbox and context of the code that owns it. Great!
Except of course that this is only a full solution if the container actually is able to do all that, i.e. is complete enough to actually do this kind of processing on its own. Effectively this means that the CoredumpReceive= logic only really works for "full-OS" containers, i.e. containers how they are typically run…
It's a boolean option: if enabled the coredump processing on the host would forward the coredumps to the unit's code. The idea is that a container manager enables this on the container's unit, and this magically ensures that coredumps that happen inside the container are delivered to the container itself, and are then processed inside of it, with the container's own coredumping logic.
Security-wise this is really nice behaviour I think: to a large degree coredump handling inside…
…logged events it all stopped on the container boundary: only with luck you'd get a proper backtrace, but you usually didn't because the coredump processor on the host couldn't deal with the different compiler/debug situation inside the container. Given that containers are mildly successful these days this of course is a big problem.
Back in v255 we added a new unit file setting CoredumpReceive= to unit files (services and scopes in particular), to address this issue.
Except of course, that until recently it all fell apart once containers came into the mix: containers typically indicate a "binary boundary" when it comes to coredump processing: the code running inside the container and the code running on the host typically do not originate from the same source, they are built differently, with different compilers, compiler settings, debug symbols, optimization levels and so on.
And that showed: while coredumps of the system itself were now nicely…
With v257 there's now a knob to address this situation too. systemd-coredump can now be configured (opt-in!) so that it will try to process coredumps of containers *on the host*. If you set EnterNamespace=yes in coredump.conf it will acquire access to the container's mount tree, mount it within its own private mount namespace to some special location, and then run coredump processing on that – while being part of the host runtime in almost all ways.
…in systemd-nspawn: they have a proper init system as PID 1, as well as service management, so that they can actually reasonably process coredumps inside in parallel to whatever else they are supposed to be doing.
Of course, containers in the Docker sense are not like that: they typically run in some weird mixture of "i am an independent system" and "i am part of another system", and the payload is run as PID 1 without any further service management available.
the usual systemd socket activation protocol) that it shall respond to Varlink queries via a passed in socket fd.
The path may also be prefixed with "ssh-unix:" in which case an SSH connection is made to some remote Varlink service.
New with v257 is that "ssh-exec:" is now also available which also sets up an SSH connection, but invokes a specified binary on the remote side, talking to it via standard input/output.
All four ways to communicate (connect to AF_UNIX socket, execute binary, …
The 2nd argument always specifies the entrypoint socket to talk to. In most cases where you call this locally that's the file system path of an AF_UNIX socket. However this can also be the path to an executable, in which case the executable is invoked and told via $LISTEN_FDNAMES (i.e.
2️⃣3️⃣ Here's the 23rd post highlighting key new features of the upcoming v257 release of systemd. #systemd257
In systemd, as mentioned in one of the previous installments, we are adopting the Varlink IPC at more and more places.
To interface with Varlink IPC from the command line we provide the "varlinkctl" tool. "varlinkctl introspect <socket>" for example introspects which method calls, types, and errors a service provides. Similar, "varlinkctl call <socket> <method> <json>" calls a method.
1️⃣9️⃣ Here's the 19th post highlighting key new features of the upcoming v257 release of systemd. #systemd257
A relatively basic feature of systemd's service management is the ability to automatically restart a service in case it terminates unexpectedly, configurable via the Restart= setting.
In v254 we added the RestartMode= setting that allows to fine tune the mechanism to use for restarting the service, i.e. it adds a logic to optionally avoid marking the service as failed between…
…named the same way for the lifetime of the system. Different definitions of "same" exist, i.e. some people prefer if the very same physical device always carries the same name (in which case the MAC address is a good identifier, if the device has one), others prefer if the slot the devices are plugged into gets the fixed name so that the devices can be replaced if broken.
The policy which naming scheme to use can be configured, see the systemd.net-naming-scheme(7) man page for details.