3️⃣2️⃣ Here's the 32nd post highlighting key new features...

3️⃣2️⃣ Here's the 32nd post highlighting key new features of the current v257 release of systemd. #systemd257

One of the features we added on early to systemd was coredump processing. We wanted that crashing processes on one hand could be treated very much like any loggable event, and on the other hand be truly and immediately useful, i.e. the log messages generated should already carry a fully symbolized backtrace.

The path towards that goal was rocky, but today I think we…

Like 13 Dec 2024 at 8:49 | Open on mastodon.social

10 comments

Lennart Poettering

…are in a pretty good position: we use libdw to generate the stack trace, running inside a local sandbox (since generating stacktraces means analyzing frequently corrupted, possibly differently privileged, complex data, which is hence security sensitive par excellence), and since the relevant distributions now ship minidebuginfo packages and are built with frame pointers enabled the default stacktraces you get this way are typically quite useful – without having to bother with gdb or so.

13 Dec 2024 at 8:52 | Open on mastodon.social

Lennart Poettering

All in all I would say systems became a lot more debuggable out of the box this way, which is not just good for quickly tracking down issues in production environments, but I also see as a relevant in context of the open source philosophy: since the whole OS is typically open source, it also means coredumps are comprehensively useful, since you can always cross-link the stackframes to the sources: the pathway from execution to the sources behind it is now nicely paved.

13 Dec 2024 at 8:56 | Open on mastodon.social

Lennart Poettering

Except of course, that until recently it all fell apart once containers came into the mix: containers typically indicate a "binary boundary" when it comes to coredump processing: the code running inside the container and the code running on the host typically do not originate from the same source, they are built differently, with different compilers, compiler settings, debug symbols, optimization levels and so on.

And that showed: while coredumps of the system itself were now nicely…

13 Dec 2024 at 8:58 | Open on mastodon.social

Lennart Poettering

…logged events it all stopped on the container boundary: only with luck you'd get a proper backtrace, but you usually didn't because the coredump processor on the host couldn't deal with the different compiler/debug situation inside the container. Given that containers are mildly successful these days this of course is a big problem.

Back in v255 we added a new unit file setting CoredumpReceive= to unit files (services and scopes in particular), to address this issue.

13 Dec 2024 at 9:01 | Open on mastodon.social

Lennart Poettering

It's a boolean option: if enabled the coredump processing on the host would forward the coredumps to the unit's code. The idea is that a container manager enables this on the container's unit, and this magically ensures that coredumps that happen inside the container are delivered to the container itself, and are then processed inside of it, with the container's own coredumping logic.

Security-wise this is really nice behaviour I think: to a large degree coredump handling inside…

13 Dec 2024 at 9:03 | Open on mastodon.social

Lennart Poettering

…the container is just like handling on the host, and the processing of the dump data is done within the immediate sandbox and context of the code that owns it. Great!

Except of course that this is only a full solution if the container actually is able to do all that, i.e. is complete enough to actually do this kind of processing on its own. Effectively this means that the CoredumpReceive= logic only really works for "full-OS" containers, i.e. containers how they are typically run…

13 Dec 2024 at 9:06 | Open on mastodon.social

Lennart Poettering

…in systemd-nspawn: they have a proper init system as PID 1, as well as service management, so that they can actually reasonably process coredumps inside in parallel to whatever else they are supposed to be doing.

Of course, containers in the Docker sense are not like that: they typically run in some weird mixture of "i am an independent system" and "i am part of another system", and the payload is run as PID 1 without any further service management available.

Bummer!

13 Dec 2024 at 9:09 | Open on mastodon.social

Lennart Poettering

With v257 there's now a knob to address this situation too. systemd-coredump can now be configured (opt-in!) so that it will try to process coredumps of containers *on the host*. If you set EnterNamespace=yes in coredump.conf it will acquire access to the container's mount tree, mount it within its own private mount namespace to some special location, and then run coredump processing on that – while being part of the host runtime in almost all ways.

13 Dec 2024 at 9:12 | Open on mastodon.social

Lennart Poettering

In many situations this new feature should be "good enough" to extract stack traces from containers, but of course it's not going to be able to fully address the fact that containers and host typically are built with different compiler/debugging settings, i.e. that the coredump processing and the coredump payload originate from different vendors with all the differences in generation this brings.

And of course, the security angle deserves attention: we are processing untrusted data…

13 Dec 2024 at 9:15 | Open on mastodon.social

Lennart Poettering

…here that comes from a differently trusted environment (the container payload) on the host. While we try to lock things down via sandboxing knobs, this nevertheless is not riskless. Because of that it's an opt-in thing, and it would be wise to enable this only if you sufficiently trust your container payloads.

13 Dec 2024 at 9:17 | Open on mastodon.social

Go Up