Email or username:

Password:

Forgot your password?
Top-level
Hector Martin

@koteisaev THAT BSOD was caused by a driver crashing, not a driver returning an error code, which is a very different thing because a crash is uncontrolled and cannot be safely handled, while an error code return is a safe and controlled condition.

Linux actually tries to prevent a full system panic, and only terminates the current process if the context is a user process. If you're lucky that means the machine keeps working as normal, if the crash didn't corrupt memory. More often than not, even in that case, the faulty driver had some mutexes locked and your system will slowly deadlock into oblivion as other processes try to lock the same mutex. There is no reasonable way around this. This is why uncontrolled crashes are bad and error returns are not.

3 comments
Kote Isaev replied to Hector

@marcan Sounds as argument against big kernel and in favor more isolated drivers, and against "hyper-privileged" software in general...
Kernel could unlock all mutexes on process death (and even if process leaked mutexes lock without crash), same way as file handles freed even if you use kill command on process....
At userspace it resembles how nodejs domains used to intercept error to prevent ungraceful process crash.

Hector Martin replied to Kote

@koteisaev

Sounds as argument against big kernel and in favor more isolated drivers, and against "hyper-privileged" software in general...

Which is what macOS did, and why this can't happen on the macOS version of crowdstrike (it uses userspace drivers).

Linux has similar mechanisms, but can't discourage kernel drivers by policy like macOS did since it's not as tightly controlled, so CrowdStrike on Linux still uses a kernel driver even though it could choose not to, because they suck.

Kernel could unlock all mutexes on process death (and even if process leaked mutexes lock without crash), same way as file handles freed even if you use kill command on process....

No. If a mutex is locked then there is no guarantee that the data protected by it is in a consistent state. You can't just "unlock all mutexes", then you just get data corruption which is worse than the partial deadlocks. Mutexes are low-level constructs. The whole point/job of the kernel is to keep track of resources in a safe manner so this can be done for userspace handles like file descriptors. The buck stops somewhere and within the kernel it is impossible to do this because at the end of the day there has to be some code in charge of atomicity/consistency for resource state and that code itself cannot be freely interruptible.

At userspace it resembles how nodejs domains used to intercept error to prevent ungraceful process crash.

... and this works because Javascript is a high-level, memory-safe language. You can't do this with C.

@koteisaev

Sounds as argument against big kernel and in favor more isolated drivers, and against "hyper-privileged" software in general...

Which is what macOS did, and why this can't happen on the macOS version of crowdstrike (it uses userspace drivers).

Linux has similar mechanisms, but can't discourage kernel drivers by policy like macOS did since it's not as tightly controlled, so CrowdStrike on Linux still uses a kernel driver even though it could choose not to, because they suck.

Kote Isaev replied to Hector

@marcan Thanks for detailed explanations. Now it seems I better understand some things.

Go Up