Email or username:

Password:

Forgot your password?
Top-level
Lux under a pile of leaves πŸ‚πŸ¦ŠπŸ‚

@lina this brings back memories of a couple of us early Rust netdev folks trying to push for a branch where we would be allowed to experiment and share abstractions that just did not have any user yet, but could be improved and reworked by people who eventually picked them up for use. It took multiple hours-long meetings to finally ending up agreeing that we would just all share links to our respective code and pray that anymore who wasn't aware of it wouldn't have to reinvent the wheel again abstraction-wise.

I don't want to fall into tribalism, or paint all C (or even netdev) developers as stuck-up assh*les, far from that, but it's clear that the requirements imposed on Rust developers to adhere to the way things have always been done in C, and effectively considering Rust as a guest who should be grateful to be in the project, rather than a system capable of making informed decisions about other systems*, places a ton of constraints that just make our development harder and slows down the inertia to get the project to mature. Meanwhile, a lot of time has to be spent by the heads of the project (bless them) doing diplomacy because otherwise patches don't get reviewed, and C devs don't see our point.

Plus the random stubborn and very opinionated dude every now and then of course, but it's the linux dev community, ofc it'll happen.

- (eg. Rust being used must not motivate large changes on the C side, Rust has to reproduce the way things are done in C however bad they are etc)

17 comments
Sobex

@SharpLimefox @lina Isn’t there a Rust-in-Linux GitHub repo, that could serve as a central point to keep track of all those abstraction and shared work ? (Possibly with more experimental branches).

Lux under a pile of leaves πŸ‚πŸ¦ŠπŸ‚

@Sobex @lina there is a repo, it's where the previous work resides in the rust branch from before the merge in 6.1. That was where we would have wanted a branch, but review rules are what they are and we would have needed to go through LKMLs with netdev people and netdev people will not review code that is not aiming to be merged (which i understand, but then why force us through LKMLs?)

And it was all just diplomacy and sticking to how things are done

Asahi Linya (朝ζ—₯γ‚Šγ«γ‚ƒγ€œ)

@SharpLimefox @Sobex The kernel dev model is very hostile to major changes like this, because the upstream devs refuse to review anything out of tree but then expect submissions to always come with users. That's why I had to submit my DRM abstractions with one huge commit at the end of the series adding my whole GPU driver, even though as expected that got zero reviews...

It causes a lot of friction any time we need to touch any core code because "show us how you're going to use it in the series" is a very effective stalling tactic when that means adding 50 patches of extra stuff just to get to the point where you can merge the intended user...

@SharpLimefox @Sobex The kernel dev model is very hostile to major changes like this, because the upstream devs refuse to review anything out of tree but then expect submissions to always come with users. That's why I had to submit my DRM abstractions with one huge commit at the end of the series adding my whole GPU driver, even though as expected that got zero reviews...

Lux under a pile of leaves πŸ‚πŸ¦ŠπŸ‚

@lina @Sobex the current model is fine for C developers trying to add simple things or fix bugs and so on, but when large architectural changes are needed (aka you welcome a new language in the codebase) it's definitely a major point of friction and literal slowdown

Regarding reviews, IIRC, the first time a network driver was sent to netdev, the first reply arrived 48h later at least and it was an apology that nobody was reviewing (i could pull the exact lore links when i'm back at a computer): there was just nobody available to review Rust. Concurring with that (and I'm sorry i'm putting my academic hat on because this is exactly what I'm focused on atm), Hongyu Li et al0 found that "RFL is bottlenecked by code review but not by code development" (insight 3), and it's pretty easy to see anytime you get an actual driver sent to the LKMLs for reviews, which is somewhat rare these days

i remember seeing your patchsets dying and your presence in the LKMLs diminishing, and it was quite sad, because of all the work put in it :( i'm also very interested in making abstractions, but finding an end user without paying for obscure or brand new hardware is so hard in net, so i just can't contribute anything, and i'm not alone...

Hopefully things fare better in the future but it's going to need a lot of change of mind from a lot of people..

@lina @Sobex the current model is fine for C developers trying to add simple things or fix bugs and so on, but when large architectural changes are needed (aka you welcome a new language in the codebase) it's definitely a major point of friction and literal slowdown

Regarding reviews, IIRC, the first time a network driver was sent to netdev, the first reply arrived 48h later at least and it was an apology that nobody was reviewing (i could pull the exact lore links when i'm back at a computer): there...

Asahi Linya (朝ζ—₯γ‚Šγ«γ‚ƒγ€œ)

@SharpLimefox @Sobex Honestly the thing that has burned me out the most has been the scheduler saga and seeing how slow the core Device/platform driver stuff (+ the device ID pain) is moving (due to pushback from upstream mostly). I don't want to take on that work myself because I didn't write the original code and I don't fully understand the nuances of the device binding cycle/lifetimes and how that was originally designed for the Rust abstractions (+ my GPU driver literally can't support rebind due to firmware limitations so it's a terrible test/demo case of how to do all that properly). But I depend on that to get anything done...

Once platform devices are supported I'll go back to sending my stuff and I hope it'll be a lot less blocked after that. For better or worse the approach ends up being "don't depend on subsystems where upstream doesn't want to cooperate and will stall, if you can help it"... I think I have a pretty good idea of what those are so things should move forward. I hope...

@SharpLimefox @Sobex Honestly the thing that has burned me out the most has been the scheduler saga and seeing how slow the core Device/platform driver stuff (+ the device ID pain) is moving (due to pushback from upstream mostly). I don't want to take on that work myself because I didn't write the original code and I don't fully understand the nuances of the device binding cycle/lifetimes and how that was originally designed for the Rust abstractions (+ my GPU driver literally can't support rebind...

Sobex

@lina @SharpLimefox Is it possible to avoid the subsystem that are a pain in the ass, when writing drivers ?

Or is this more a case of not upstreaming those drivers and upstreaming stuff that goes in Rust friendly subsystem first ?

Asahi Linya (朝ζ—₯γ‚Šγ«γ‚ƒγ€œ)

@Sobex @SharpLimefox It's possible in some cases, not others. You can't avoid the subsystems "above" you providing your services (DRM for me) and "below" you interacting with other devices, but you can avoid "helper" subsystems and parts of them.

The DRM scheduler is a helper. The other thing I'm going to RiiR is the GPU page table management, which is currently using the IOMMU page table helpers (both because the maintainer stopped replying to my emails... and because since the original submission my needs have diverged enough from the standard code that I think it stopped making sense to reuse it and try to fight to shoehorn those features in).

@Sobex @SharpLimefox It's possible in some cases, not others. You can't avoid the subsystems "above" you providing your services (DRM for me) and "below" you interacting with other devices, but you can avoid "helper" subsystems and parts of them.

The DRM scheduler is a helper. The other thing I'm going to RiiR is the GPU page table management, which is currently using the IOMMU page table helpers (both because the maintainer stopped replying to my emails... and because since the original submission...

Lux under a pile of leaves πŸ‚πŸ¦ŠπŸ‚

@Sobex @lina IMO regardless of subsystem you'll have to endure the structural issues plaguing RFL, but a subsystem full of people willing and motivated to help should alleviate those significantly

And there's also the problem that if your interest is in a specific area of kernel dev (drm, net, whatever), you've only got so much wiggle room to avoid maintainers unwilling to help more.

Someone i talked about this with mentioned forking as well, but that comes with the human cost of maintaining the upstream source (mainline), people having to decide whether to leave the main project, stay, or invest twice as much energy, and the risk that nobody will really use your fork (Asahi Linux being the huge exception for that point lmao)

@Sobex @lina IMO regardless of subsystem you'll have to endure the structural issues plaguing RFL, but a subsystem full of people willing and motivated to help should alleviate those significantly

And there's also the problem that if your interest is in a specific area of kernel dev (drm, net, whatever), you've only got so much wiggle room to avoid maintainers unwilling to help more.

Asahi Linya (朝ζ—₯γ‚Šγ«γ‚ƒγ€œ)

@SharpLimefox @Sobex Hard forks aren't sustainable. Asahi Linux tracks upstream and with that comes a maintenance cost proportional to how much it diverges. The plan was always to upstream almost everything... the project doesn't have a long term future if we get stuck maintaining large amounts out of tree forever (a few patches is OK and we already have a couple "upstream is never going to take this and that's fine, we can live with that" cases... but it can't grow without bound)

Sobex

@lina @SharpLimefox Urgh, what falls in the upstream won’t take that ever case ?

Asahi Linya (朝ζ—₯γ‚Šγ«γ‚ƒγ€œ) replied to Sobex

@Sobex @SharpLimefox TSO support for fast x86 emulation and the M1 cpuidle driver, both are NAKed by all the ARM employees (for politics reasons) and they own the arm64 platform subsystem.

The actual patches are pretty small and not intrusive though so it's not a big maintenance burden.

In theory the cpuidle stuff is supposed to be replaced by some PSCI transport alternative that doesn't exist, but that idea has been floated for years now and not gone anywhere...

Sobex replied to Asahi Linya (朝ζ—₯γ‚Šγ«γ‚ƒγ€œ)

@lina @SharpLimefox Hmm, what are the politics reason (rather than undocumented, and not required to make the hardware run) ?

Asahi Linya (朝ζ—₯γ‚Šγ«γ‚ƒγ€œ) replied to Sobex

@Sobex @SharpLimefox For TSO they fear "fragmentation" as in developers using TSO to work around arm64 porting issues instead of fixing the code, even though there is no evidence of anyone ever even thinking to do that.

For cpuidle, the maintainers want PSCI (a firmware interface) to be the only cpuidle driver to avoid driver proliferation. Unfortunately the PSCI spec was written with assumptions about the platform that make it impossible to implement on Apple Silicon despite it being a conformant arm64 implementation (they require optional features).

@Sobex @SharpLimefox For TSO they fear "fragmentation" as in developers using TSO to work around arm64 porting issues instead of fixing the code, even though there is no evidence of anyone ever even thinking to do that.

For cpuidle, the maintainers want PSCI (a firmware interface) to be the only cpuidle driver to avoid driver proliferation. Unfortunately the PSCI spec was written with assumptions about the platform that make it impossible to implement on Apple Silicon despite it being a conformant...

mort

@lina This makes me wonder, how does the future actually look like for your driver? Will the M1 GPU driver just always live out-of-tree because upstream is institutionally incapable of accepting GPU drivers written in Rust? Or are there signs of progress?

(Also, thank you for all your work, I'm a happy user of your driver and it has been nothing but great for me :3)

Asahi Linya (朝ζ—₯γ‚Šγ«γ‚ƒγ€œ)

@mort The DRM people are actually very nice in general except for that one scheduler guy. So I don't expect any major trouble merging the driver itself once the dependencies are in. ^^ (and I might seriously just rewrite the scheduler in Rust as driver-internal code just so I don't have to deal with that guy and that mess of buggy C code... it's not just the lifetime stuff, I keep finding outright memory safety bugs.)

The good news is Nova (the new Nvidia driver) is also written in Rust with the same abstractions, so now we have two GPU drivers to put some pressure on the people blocking things...

@mort The DRM people are actually very nice in general except for that one scheduler guy. So I don't expect any major trouble merging the driver itself once the dependencies are in. ^^ (and I might seriously just rewrite the scheduler in Rust as driver-internal code just so I don't have to deal with that guy and that mess of buggy C code... it's not just the lifetime stuff, I keep finding outright memory safety bugs.)

mort

@lina Oh, that's great news! Being the odd one out is always more challenging, I wasn't aware that Nova was also Rust.

Imikoy

@lina@vt.social

there is a rust driver for the gpu that I have
I can touch gpu rust code in the kernel and see it do things... *appends to the list of remember-later topics*

Go Up