So, if you ask me what my takeaway from the Crowdstrike...

So, if you ask me what my takeaway from the Crowdstrike issue is, I'd say: boot counting/boot assessment/automatic fallback should really be a MUST for today's systems. *Before* you invoke your first kernel you need have tracking of boot attempts and a logic for falling back to older versions automatically. It's a major shortcoming that this is not default behaviour of today's distros, in particular commercial ones.

Of course systemd has supported this for a long time:

https://systemd.io/AUTOMATIC_BOOT_ASSESSMENT/

Like 20 Jul 2024 at 12:40 | Open on mastodon.social

40 comments

Lennart Poettering

And it's a shame that commercial distros do not hook into that, and the boot stack of them hasn't changed in more than a decade, is laughably bad at security (unsigned initrds, ffs!) and robustness, and the if you have boot assessment enabled at all turn it into a fantastic DoS (by showing you a boot menu instead of reverting to a working boot choice).

20 Jul 2024 at 12:44 | Open on mastodon.social

David Haller

@pid_eins Is there any distro that has implemented automatic boot assessment, as you suggested?

20 Jul 2024 at 12:50 | Open on franken.social

Raito Bezarius

@david @pid_eins nixos

20 Jul 2024 at 12:54 | Open on nixos.paris

makefu

@raito @david There were a number of Pull Requests for this feature (one implementation was even merged to master for 20 minutes) but none is currently available, no? I'd really love to use this feature, just today one of my boxes would have been saved by that 👍

20 Jul 2024 at 20:46 | Open on jit.social

Raito Bezarius

@makefu @david for UKI, it was already available, for normal NixOS usecases, yes, it was merged recently but I have been using it for a while

20 Jul 2024 at 20:51 | Open on nixos.paris

karlggest

@david
Any inmutable. All of them works this way by design.
@pid_eins

20 Jul 2024 at 13:00 | Open on mastodon.social

Lennart Poettering

@karlggestd @david not really true. The ones that use systemd-boot migth, but boot counting in grub is pretty useless and manual if you ask me.

20 Jul 2024 at 13:08 | Open on mastodon.social

karlggest

@pid_eins @david wow, My mistake, I assumed that Aeon was the most delayed project (they are with the latest RC).

20 Jul 2024 at 18:07 | Open on mastodon.social

christian mock

@pid_eins @karlggestd @david You can do it, but it is involved. But basically, using BootNext entries in EFI and only setting the fixed boot order after the system is booted is what I implemented for an immutable digital signage project.

21 Jul 2024 at 15:12 | Open on chaos.social

Bou

@karlggestd @david @pid_eins Fedora Silverblue doesn't, according to my experience.

21 Jul 2024 at 9:45 | Open on liberdon.com

karlggest

@bou
https://mastodon.social/@karlggestd/112820147160685315
@david @pid_eins

21 Jul 2024 at 11:35 | Open on mastodon.social

Leaflet

@david @pid_eins Aeon probably. @sysrich would know.

20 Jul 2024 at 13:20 | Open on fosstodon.org

bluca

@david @pid_eins debian, for the next release

21 Jul 2024 at 7:30 | Open on fosstodon.org

Eric Curtin

@david @pid_eins Yes there are distros and commercial distros that hook into that RHEL for Edge and Red Hat In-Vehicle Operating System automatically rollback after a number of failed boots. Any rpm-ostree/ostree/bootc based OS is capable of it.

22 Jul 2024 at 11:36 | Open on social.treehouse.systems

poleguy

@pid_eins I'm not disagreeing. It makes me wonder how you would categorize/assess/mitigate the security and operations risk of having a system that's supposed to be on one kernel fall back to a previous one?

20 Jul 2024 at 13:24 | Open on mastodon.social

Lennart Poettering

@poleguy the way automatic boot assessment with systemd works is that on each boot we make one of three assesments: "good", "bad", "dontknow". If we make the "bad" assessment we'll count down the entry's counter (and if it ist zero we give up on it in the future). if we make the "good" assessment we'll drop the counter entirely from the entry, marking it as good for basically all eternity. If we do "dontknow" we don't do a thing

20 Jul 2024 at 13:37 | Open on mastodon.social

Lennart Poettering

@poleguy this means that a bad actor can play games with us until the point we managed to do one boot that worked correctly, but from that point on, we'll never regress anymore.

I like to believe that that's quite a sensible and simple policy that should work for most cases. It balances robustness against chance for attackers to hold off updates indefinitely.

20 Jul 2024 at 13:39 | Open on mastodon.social

poleguy

@pid_eins thanks. That does seem reasonable and for remotely managed systems and better than the alternative, which is manual intervention. I worry a smidge about added complexity. I can't shake the feeling that we keep adding layers of complexity to our systems. It feels okay to add complexity that is proportional to the complexity of the problem being solved. In this case it seems sane. However these remotely managed systems all tend to have out of band methods to recover already, no?

21 Jul 2024 at 17:58 | Open on mastodon.social

Sheogorath 🦊

@pid_eins but would this really prevent it, when the configuration of a kernel driver goes bad? If I understand things correctly here (big if), only if you store that config in a volume that can be reverted it would be possible to fix the issue.

Otherwise you boot into the emergency shell and you are non the wiser than Windows systems are right now.

And given it's an endpoint protection that is supposed to react pretty instant to changes, I don't see how you would get theses in the A/B update.

20 Jul 2024 at 17:06 | Open on microblog.shivering-isles.com

Lennart Poettering

@sheogorath on linux drivers dont really have a "configuration" per se. At least not much you pass into the early, risky parts of the boot process. Subsystems might have some config. In a systemd world you wrap the im authenticated/signed PE addons or confext images, and those you drop next to a specific kernel image, thus you can revert them together as one or update as one and so on. Or in other words: the way we parameterize kernels in modern ways also makes it easy to do assessment/fallback.

20 Jul 2024 at 21:43 | Open on mastodon.social

Justin Azoff

@pid_eins how exactly is a successful boot defined though?

Boots to init?
Boot and all services are started successfully? Some services?

What happens if the system boots successfully, runs for ~60 seconds, and then the kernel panics when the first cron job/timer runs?

20 Jul 2024 at 13:45 | Open on infosec.exchange

furicle

@JustinAzoff @pid_eins see the link at the start of the thread, flexible strategies available

20 Jul 2024 at 14:38 | Open on mastodon.social

Lennart Poettering

@JustinAzoff depends on the usecase. Different systems/OSes want different stuff there. Some might just check if system manages to reach some point in the boot process, others might want to also require network pings to work, other stuff might instead just want to check that some services stay up for some minimum amount of time and so on. systemd gives you the basic infra for this and some super basic tests in this sense, but individual OS images might want to fill in more tests/conditions.

20 Jul 2024 at 14:46 | Open on mastodon.social

John Gordon

@JustinAzoff I assume anything to make boot more complex also opens up new threats.

21 Jul 2024 at 14:30 | Open on appdot.net

Matěj Cepl 🇪🇺 🇨🇿 🇺🇦

@pid_eins

Well, the lesson for me (aside for other obvious ones) is that for the industrial systems it should be absolutely mandatory to be something like #SUSE #SLEMicro (or its Red Hat equivalent): snapshot based, with R/O system, where the system would automatically boot from an older snapshot if the current one fails.

The fact that airline computers are not something like this, is just mind-blowing.

Yes, preaching the same gospel @sysrich preached for years.

https://youtu.be/idZEJ0OYfWU

20 Jul 2024 at 14:03 | Open on floss.social

James Henstridge

@pid_eins for a system like Crowdstrike, you'd want to extend that to cover data files the kernel loads. I wonder how well that'd work with the rate of updates they were pushing out?

20 Jul 2024 at 14:15 | Open on aus.social

Lennart Poettering

@jamesh i think everyone agrees you have to cover the kernel itself and the initrd with these assesment/fallback schemes. I personally would also then cover the rootfs you boot into with that, but people have different opinions how far the coverage should reach, and how much you "pin" through a boot attempt.

20 Jul 2024 at 14:43 | Open on mastodon.social

vurpo 🏳️‍⚧️

@pid_eins unfortunately this wasn't the kind of issue that would be solved by falling back to old versions. The bug in the kernel module was there for a long time or possibly from the beginning, and falling back to an older version would still just have crashed in the same way

20 Jul 2024 at 14:47 | Open on mastodon.coffee

Lennart Poettering

@vurpo nope, of course boot assessment would catch this. Key is just that you "pin" enough as part of an attempt, and thus can revert sufficient parts to get things working.

On Linux you'd pin kernel *and* initrd at the very leas, and in the model i propose even the entire /usr for each attempt, to maximize coverage of the assesment logic.

20 Jul 2024 at 14:50 | Open on mastodon.social

bse

@pid_eins @vurpo I would assume you also have to pin /lib/modules, or better get rid of that relic completely and move modules inside the UKI?

21 Jul 2024 at 8:24 | Open on muenchen.social

Lennart Poettering

@bse @vurpo kernel modules are pinned by the kernel's version number, i.e. looked for in /usr/lib/modules/`uname -r`/.

21 Jul 2024 at 8:39 | Open on mastodon.social

bse

@pid_eins @vurpo Yes, but what happens if you install a faulty out-of-tree module that gets built for all existing kernel versions, for example via dkms, and put into /lib/modules/*/?

21 Jul 2024 at 8:52 | Open on muenchen.social

Lars Marowsky-Brée 😷

@bse @pid_eins @vurpo openSUSE with snapper can reboot into a full older snapshot of the system (except user data), which has saved my butt a few times.

21 Jul 2024 at 10:26 | Open on mastodon.online

Gabe

@bse @pid_eins @vurpo If you deliberately bypass the system integrity and safety features, then they won't save you. It doesn't matter what those features are.

If you blindly sign initrds, checking won't help you. If you blindly mark a boot config as good, or if you replace your rollback image, or whatever...

The system won't save you from yourself infallibly. You'll still want staging, and monitoring, and disaster recovery.

21 Jul 2024 at 16:54 | Open on mendeddrum.org

Lennart Poettering

@bse @vurpo dkms really should synthesize separate menu items for its rebuilds. If it doesnt, it's simply broken and should be fixed.

21 Jul 2024 at 17:02 | Open on mastodon.social

bse

@pid_eins @vurpo Since both entries would be using the same kernel and hence use the same /lib/modules/$(uname -r)/, you need a mechanism to have multiple versions of your modules folder. If you're serious about preventing older boot entries from breaking retroactively, i think full system snapshots are the only option. Short of that, there might be some compromises like bundling a kernel and all modules, which of course does not protect userspace, but might be easier for commonly used distros.

21 Jul 2024 at 17:50 | Open on muenchen.social

Bou

@pid_eins wait, distros could just just enable it and they don't? How come?

21 Jul 2024 at 9:44 | Open on liberdon.com

Lennart Poettering

@bou they love grub too much and how things where done in 1999...

21 Jul 2024 at 17:03 | Open on mastodon.social

Lars Marowsky-Brée 😷

@pid_eins The shocking thing is that this was a requirement for Carrier Grade Linux two decades ago already.
When it comes to reliability and availability as part of dependable computing, our (distributed or not) systems have somewhat regressed as they were scaled up.

21 Jul 2024 at 10:24 | Open on mastodon.online

Anthk

@pid_eins

You mean, like keeping old grub/lilo entries and kernels since forever?

22 Jul 2024 at 15:56 | Open on paquita.masto.host

Go Up