32 comments
@bugaevc @janneke https://github.com/fosslinux/live-bootstrap project has some initial code to bootstrap Linux. It can build Linux but we still need to kexec into it (which shouldn't be too hard). @janneke @bugaevc The folks in #bootstrappable @liberachat are working towards resolving those questions. A POSIX kernel capable of building Linux, and a bootstrap from UEFI are some projects off the top of my head. They want to get to a FPGA softcore bootstrap, then a manually constructed CPU in TTL to bootstrap from. But yeah, there are many parts to work on that would improve our (collective) situation, such as bootstrapping GHC: @nomeata https://mastodon.online/@nomeata/110263917613134533 @theruran @janneke I was thinking something along these lines: find an "open source hardware" board where you can somehow verify the hardware aren't playing games on you (in particular not running all of your code in a nearly undetectable hypervisor, like we know Intel does...), probably some RISC-V board run you bootstrapping code on it with no OS whatsoever; hopefully it doesn't need much from the OS you'd have to build in a serial driver or something like that (blinking LEDs is cool but you can't input program source this way), not that I have any idea about hardware @theruran @janneke the Hurd surely can run GCC and cross-compile Linux; but I'm not sure you would be winning much, for two reasons: 1. It's nowhere near as trivial to do "syscalls" as on Linux — on Linux you place some values into some registers and perform "int 0x80" or "syscall", and that's it, you've called write or exit. On the Hurd, these all are implemented in glibc on top of Mach IPC, and that needs quite a lot of code to happen. @theruran @janneke Here's a project of mine where I simply print "Hello world" without relying on glibc: https://github.com/bugaevc/hello-hurd — but that too is written in C, imagine writing it all in hex. 2. Linux is huge, but you can build it in a minimal configuration (see https://tiny.wiki.kernel.org/). Mach may be a microkernel, but it's minimal in functionality, not size. In fact it's a meme in the microkernel community just how large for a microkernel Mach is. But I don't have any numbers to quantify this. @bugaevc Speaking of the role of the kernel, an interesting question is how to implement isolated builds on the #Hurd—see “Isolated build environments” at https://guix.gnu.org/en/blog/2020/childhurds-and-substitutes/ for an overview. I’m curious what you think of this! @civodul hi! I'm probably not Guix-savvy enough to fully comprehend the issue here — but as I understand it, you want to be super explicit about what each package needs to be built. Do you include libc, cc, binutils into this list of dependencies? (I imagine you do, otherwise it wouldn't be reproducible.) Apparently you do include /bin/sh. So yeah, the Hurd servers aren't much different or any more "external" to the environment than /bin/sh. I don't think you should be firmlinking stuff from the host; you should probably just spawn a mini subhurd for each build. You want pipes and fork/exec, so you need pflocal, proc, and exec servers. (Also /servers/proc, mentioned in your mail, is not a thing, of course 🙂 — the proc server is one of the two servers, the other one being auth, that are not accessible through the file system, but only through _hurd_ports.) Your mail about /bin/sh also raises an interesting topic of paths. Do you want to change /dev/null and /servers/exec to some other (hash-derived I would imagine) paths? Sounds wild but you totally could! You could then either patch glibc (and everyone who expects to find /dev/null at its usual place), or provide symlinks. But then again I don't know enough about Guix to judge here. Unfortunately all this wouldn't help you too much with bootstrapping from source, since you cannot do I/O easily on the Hurd like you can on Linux with a few instructions; you need to do RPCs and all that (even to get your argv). This is of course hidden from you when you're using glibc. > Also, one could argue that things like /dev/null have a well-defined interface that’s set in stone and that, consequently, how they’re implemented does not matter at all. Yes, but also no: there certainly can be differences in behavior that are allowed by the interface (where it explicitly doesn't guarantee something), but (due to bugs) can influence the outcome. For instance, does every write to /dev/null always write the whole buffer, or can there be short writes? Or: can a signal interrupt a write to /dev/null? (On SerenityOS the answer used to be no, on the Hurd it's a resounding yes, dunno about Linux.) See https://github.com/SerenityOS/serenity/issues/797 for how this can break things @bugaevc Exactly! So the question becomes: assuming you have nothing but the Mach syscalls at your disposal, what chain of programs building on each other would eventually let you run a proc and an exec server so you have the beginning of a POSIX build environment? The whole stage0/M2/Mes story on Linux was quite a puzzle; its Hurd version would push it further. :-) @bugaevc The Hurd code lives in /gnu/store/…-hurd-*, but the translation points in the build environment would remain /dev/* and /servers/*. Changing that would be impractical and bring nothing. Here's a fun little problem: if you have lost your proc and auth ports, but still have your fs root dir port, how can you recover those two? @bugaevc Possibly (but not necessarily) by looking up /servers/proc for the first one; as for auth, it’s forever lost? Yes, /servers/proc is not it :) I was thinking of the following scheme, which I have not tried, so this is just a theory. You create an executable (perhaps as an unnamed file) that is setuid to yourself, and then exec it (not over your own task, unless you want that), without passing an auth or proc ports (as you have none). The translator notices this and creates a new auth handle based on its idea of your effective uids/gids (see libfshelp/exec-reauth.c); and then the exec server gives the new task a fresh proc port. You cannot access the new task because of setuid/EXEC_SECURE, but as you created the executable you still control what it does. In particular it may send its proc/auth ports back to the original task, and the original proc port may then be recovered by a simple proc_task2proc (other_proc, mach_task_self (), &my_proc) The exact auth port I don't think can be recovered, but at least you now have another auth port with your effective uids/gids. @bugaevc The build environment includes nothing bug the explicitly-declared userland dependencies. If a package depends on GCC and Binutils, it gets them; if not, it doesn’t. There’s no /bin/sh there—no /bin, no /usr, nothing. On Linux, there’s /dev and /proc, but for separate namespaces. @civodul @bugaevc @janneke This now got me thinking... I found a Hurd post talking about how it adds POSIX compatibility to Mach: https://www.gnu.org/software/hurd/community/weblogs/ArneBab/technical-advantages-of-the-hurd.html And it says it still provides access to the capability-based permissions underneath, which sounds nice. But it also got me thinking: there's likely to be a lot more software targeting WASI soon, which is natively capability-based. Could it be possible for Hurd to have WASI compatibility too? @jfred I have to admit I don’t know WASI… But overall, I’m not enthused by the idea of adding an extra interpretation layer like Wasm on top of my CPU. Capsicum or the Hurd’s native interfaces look more appealing to me though. |
@janneke not to underpaint the importance and coolness of this achievement, here's an uninformed question that you probably get a lot: how does this work wrt to depending on a Linux kernel (which is tons of C), some basic userland (or can it run as PID 1-and-only?), and x86 hardware (which... who knows what it does) to run this 357 byte binary?
If you can't trust a compiler to build your program correctly, why can you trust a kernel and some hardware to run your binary correctly?