Email or username:

Password:

Forgot your password?
Top-level
Hector Martin

The Linux modular subsystem approach falls flat on its face here. The Type-C, USB, PHY, etc. drivers are too loosely coupled. Even if we get this to work, it's liable to break any time anyone changes common code and subtly changes the order of operations. It is not practically possible to build an overarching, top-down, abstracted environment for tying drivers together like this when the underlying hardware has much more tightly coupled requirements for how it is driven. And don't get me started on adding in PCIe into the mix once we need to get Thunderbolt to work...

The "PHY" concept is the most egregious offender. PHY hardware is as diverse as mailbox hardware, yet Linux thinks it's reasonable to abstract it behind a bunch of rigid operations like "init", "power_on", "set_mode", "set_media", "set_speed", "validate", "calibrate". Then you end up with unions like PHY configuration operations for MIPI DPHY, DP, LVDS, .... And all this starts becoming an intractable mess when you have stuff like a Type-C phy that talks USB2, USB3, DisplayPort, and Thunderbolt on one side, and has interfaces for a USB2+USB3 host/device controller (talking to the USB2 side and either nothing, the USB3 side, or a USB4 tunnel on its SuperSpeed side), a PCIe controller, and a bridge to a DisplayPort mux that then connects to an array of display controllers, and which has fun dependencies like having the USB3 controller hard-reset register as part of the PHY, and init sequences that are tightly coupled to how the other peripherals are driven, and if you get it wrong everything just breaks.

When there is no universal hardware spec for what interface a given type of PHY (never mind all types of PHYs) should have, in terms of operations and state machines, trying to abstract it out under such an interface in software is a path doomed to failure. It might work for simple PHYs for simple protocols of a decade or two ago, but vendors are shipping "do everything USB Type C can do including multiple protocols in parallel and tunneling all in one PHY" now and good luck with that.

What we really need is a bottom-up approach where a vendor-specific driver ties everything together. But Linux hates that approach with a passion, because it's what hacky vendor kernel forks do all the time. And yes, when done poorly, as they do, it sucks. But sometimes it's the only way to get anything sane. Linux does support this concept in at least some places: the ALSA ASoC subsystem supports machine drivers into which different codec/DMA/PCM drivers plug in, and that's how we tie together all the audio hardware for Asahi Linux.

But there's nothing like a "Type C top-level port management driver", nor are the underlying driver interfaces structured to allow this, nor fine-grained enough to compose properly. It would be a major new undertaking.

And then we come to the final problem: We don't have time to fight for a huge refactoring of kernel subsystems. If the kernel community were in general friendlier, discussion and patch submission weren't saddled with a ton of friction, etc., we could entertain the notion. But given past experiences with LKML threads... sorry, I'm not going there.

End result, this is highly annoying and demotivating to work on. That's why the person who started working on it mostly lost interest. Even the USB3 support we've shipped so far partially worked by accident (it has the 5-second delay bug and other issues), we just didn't dare touch it for fear of breaking it. And now we need to get all the other features to work without regressing anything.

Best I can hope right now is I make this barely work with the existing approach and a bunch of hacks, that upstream won't block any of it, and that nobody will touch the code and break it. Wish me luck.

And if you think you can help and are willing to fight the bureaucracy to make this less insane in Linux, please get in touch.

23 comments
gaytabase

@marcan friendship ended with everything is a file, only everything is a PHY matters now

Paul Barker

@marcan Would a multi-function driver (drivers/mfd) help?

For a certain SoC our team are working on at Renesas there are muxes, common configuration registers and ordering dependencies between different Ethernet subsystems. This can be handled by an MFD driver which has functions that can be called by the subsystem-specific drivers.

Hector Martin

@pbarker As far as I know mfd is mostly just a convenient way to have subdevices. We actually already had a fight with that maintainer because the SMC driver we have is a good fit for that subsystem, but he kept saying it wasn't because SMC doesn't "look" like most register-based drivers and he didn't understand why it made sense to use it. Still haven't had a chance to refactor and resubmit all that into what we finally agreed on (after a long mailing list discussion and a further IRC discussion).

But I don't think it fits here since in this case it's not even register-based devices involved only (e.g. tipd is on an I²C bus). We can't realistically nest all the dependencies under a parent driver. In fact for stuff like DCP which is behind a mux, that doesn't work at all since the mapping isn't 1:1.

And then the issue is getting things like the existing dwc3 driver to talk to us the way we need. No matter how you organize the device tree, that's still going to be a point of contention because it's a shared driver used by many other systems with different requirements.

@pbarker As far as I know mfd is mostly just a convenient way to have subdevices. We actually already had a fight with that maintainer because the SMC driver we have is a good fit for that subsystem, but he kept saying it wasn't because SMC doesn't "look" like most register-based drivers and he didn't understand why it made sense to use it. Still haven't had a chance to refactor and resubmit all that into what we finally agreed on (after a long mailing list discussion and a further IRC discussion).

Janne Grunau

@marcan Even the USB3 support we ship broke already at least twice due to upstream changes and we had to fix it.

Dave Polaschek

@marcan I have nothing to offer but the hope you can sort it out.

Dirkjan Ochtman

@marcan sounds unfortunate. Would it be feasible to get buy-in from a high-level maintainer (maybe Torvalds or one level below that) before you embark on the refactoring, so that you have a mandate you can use to (somewhat) shout down the naysayers during actual review?

doragasu

@marcan Sorry to read all that. Wish I could help, unfortunately kernel development is out of my reach. It must be really demotivating when the biggest problem is not technical and has no sane solution in sight. Hope the situation improves somehow.

Wouter van Heyst

@marcan ai that sounds rather painful :( As a happy Asahi user I thank you for your work and wish I could contribute more.

TellowKrinkle

@marcan Kind of curious, how do other laptops' USB-C ports work on Linux?

Hector Martin

@TellowKrinkle I'm not even sure if a full unified PHY for all protocols including USB4 like Apple's exists for any other vendor, and if it does (recent AMD stuff maybe?), it's all managed by firmware/ACPI and built to work with less tightly coupled drivers on the OS side.

Simpler implementations that just do USB3 and nothing else have the PHY managed by the xHCI controller and there is no mode switching or PHY driver. A basic DP/USB3 implementation as might exist in many non-Thunderbolt laptops and some recent embedded platforms would just use a dumb external mux to switch the data lines (which is what the simplistic Linux model is designed to work with, if it isn't just managed by ACPI). Again no PHY drivers, just a trivial mux driver. Intel Thunderbolt stuff is managed by external Thunderbolt controllers with piles of firmware and ACPI glue on the OS side, and again just looks like discrete controllers to the OS.

We are certainly the first platform trying to shove full OS-managed Type-C USB4 PHY support into Linux. Also I'm pretty sure Apple is the only vendor in existence that does full USB4+TBT3+DP+dual-role USB2/3 (including device role / gadget mode). Don't think any other laptop can do that (plug in a Mac into another non-TBT/USB4 host and the Mac enumerates as a SuperSpeed ethernet interface and exposes network services like SMB).

It doesn't help that Apple is using DesignWare USB3 controllers which are one of the worst in terms of quirks and hacks required. The list of special case hack toggles and properties in the devicetree bindings for dwc3 is ridiculous.

@TellowKrinkle I'm not even sure if a full unified PHY for all protocols including USB4 like Apple's exists for any other vendor, and if it does (recent AMD stuff maybe?), it's all managed by firmware/ACPI and built to work with less tightly coupled drivers on the OS side.

Simpler implementations that just do USB3 and nothing else have the PHY managed by the xHCI controller and there is no mode switching or PHY driver. A basic DP/USB3 implementation as might exist in many non-Thunderbolt laptops...

Sawyer Bergeron

@marcan I'm sure this is pure naivety on my part, but is "embrace out of tree until they come to their senses" reasonable at this point? I know that's an easy way to piss off some maintainers, but I can't imagine the people who _would_ take offense haven't already over something else

Hector Martin

@sawyerbergeron Major refactors are not viable out-of-tree. It would increase our rebase workload significantly. The only things viable to carry out-of-tree for longer periods of time are small fixes and whole new drivers, not major subsystem changes.

Sawyer Bergeron

@marcan oh yeah, I don't mean dragging the entire subsystem and all of the leaves along, I meant treating this all as one big driver and pulling in what you need from those subsystems. Guessing still too big a lift?

Hector Martin

@sawyerbergeron I'm leaning towards that option if the current approach ends up completely nonviable, but it would still require at least a pretty horrible patchset to a few other drivers to make them play nicely. E.g. we will probably have to add some kind of ugly hooks to dwc3 so it can call into the PHY driver in more, fine-grained places. And also get rid of its workqueue nonsense for role-switch actions, since running stuff in other threads is a recipe for disaster. The only way this is going to work and be robust is if all the mode changes and init/shutdown logic runs within a single (logical) thread (per Type C port/controller). And then we need to do something about tipd, though I'm starting to lean towards "fork/rewrite it", given how much Apple's variant is already diverging from upstream tipd, and how the non-Apple tipd driver is very barebones and has no altmode support at all anyway.

But then the question remains, is this ever going to be upstreamable. Because if it isn't, we lose long term.

@sawyerbergeron I'm leaning towards that option if the current approach ends up completely nonviable, but it would still require at least a pretty horrible patchset to a few other drivers to make them play nicely. E.g. we will probably have to add some kind of ugly hooks to dwc3 so it can call into the PHY driver in more, fine-grained places. And also get rid of its workqueue nonsense for role-switch actions, since running stuff in other threads is a recipe for disaster. The only way this is going...

Raven667

@marcan @sawyerbergeron I'm no hardware engineer so I'm talking out of my ass here, but the way you describe it it seems like forking the relevant driver(s) and making a hacky prototype would allow you to get something working and tested out the door, then you would know more exactly what the scope of work would be to refactor/rewrite in a production/upstream ready style and what kind of glue would be needed to do all the work that ACPI/firmware provide on other systems, trying to make it production ready from the jump seems like it's a lot harder than you'd like. I suppose the problem for you (and all those other vendors out there) is that if there is no time to rewrite it later, then the time you put in now is the only time you'll have to work on it with this intensity, once it works you'll have other top priorities so if you only end up with a hacky prototype then maintaining it will become burden that takes time away from a production ready version being developed.

@marcan @sawyerbergeron I'm no hardware engineer so I'm talking out of my ass here, but the way you describe it it seems like forking the relevant driver(s) and making a hacky prototype would allow you to get something working and tested out the door, then you would know more exactly what the scope of work would be to refactor/rewrite in a production/upstream ready style and what kind of glue would be needed to do all the work that ACPI/firmware provide on other systems, trying to make it production...

spyke

@marcan High level abstractions starting to leak -- nothing new there. Only few companies actually make drives for Linux (like Intel), so no wonder that you have a lot to improve and implement.

Hector Martin

@spyke Yeah, because ~all the downstream embedded (non-x86) vendors suck and nobody else tries to upstream anything or attempt to engineer anything to be upstreamed...

Sobex

@marcan one question.

Is the issue you are encountering likely also a pain for other USB 4 / Thunderbolt stacks drivers ?

Would there be a way to design a subsystem to deal with USB / Thunderbolt and alt mode independently from that stack, to actually make it possible to deal with those stacks as a whole ?

(That would mean a lot more code, but probably avoid the nonsensical issues you currently have)

knuxify

@marcan reminds me of a fun case that I still haven't 100% figured out on the exynos 4 - how to get USB OTG working reliably.

the microUSB controller is a mostly-standard DWC2 controller... except the host mode bits have been forcibly ripped out. instead, USB host mode is handled by a separate block on the SoC (hsic). in order to switch between the two, a bit needs to be flipped in a system control register (0 means host mode, 1 means device mode, or something like that - would have to check the code). *both controllers cannot be at the same time*; if dwc2 tries to do *anything* while the host mode is enabled, the driver freezes; the hsic driver is a bit nicer about this though.

the current implementation in the kernel does the necessary bit switch on the USB PHY driver; if the host mode PHY is enabled, it switches to host mode, if the device mode PHY is enabled it switches to device mode. this works fine if you only need one of the two, but if you need both, you get this fun interaction:

- USB device mode gets initialized first, PHY and dwc2
- USB host mode gets initialized second, flips the switch to host mode, inits PHY and hsic
- DWC2 attempts wake-up and fails, causing peripheral mode to not work; so only host mode works.

i ended up hacking around this by making the mode switch happen based on the state of an extcon connector (that is a whole mess in and of itself - there's at least 2 or 3 ways to declare such a dependency in DTS...), and it mostly works but still has the caveat that dwc2 is still technically running so if that driver tries to do any funny reset/resume/etc business and we're in host mode, it fill freeze up and break device mode until reboot

@marcan reminds me of a fun case that I still haven't 100% figured out on the exynos 4 - how to get USB OTG working reliably.

the microUSB controller is a mostly-standard DWC2 controller... except the host mode bits have been forcibly ripped out. instead, USB host mode is handled by a separate block on the SoC (hsic). in order to switch between the two, a bit needs to be flipped in a system control register (0 means host mode, 1 means device mode, or something like that - would have to check the...

knuxify

@marcan (if any smart kernel dev reads this genuinely do give advice lol, i wrote down the details in gitlab.com/exynos4-mainline/li)

Hector Martin

Sent an email to lkml & friends about this, but I'm not holding my breath that it's going to lead to a clear solution...

lore.kernel.org/lkml/fda8b831-

Xerz! :blobcathearttrans:

@marcan well, at least they won’t blame you for complaining behind their backs

kloenk

@marcan just write the new subsystem in rust for more shits and giggels

Go Up