@millihertz When I thought through reengineering an...

leah & asm & forth, oh my!'s posts Post Back to profile

@millihertz When I thought through reengineering an OS/browser from the hardware on up what made sense to me:

Have large amount of memory with fast burst-transfers, processing the data in it linearly. So we can still process reasonably large files!

And a smaller amount of memory in which we can arbitrarily rearrange data.

Like 4 July at 1:24 | Wall-to-wall | Open on floss.social

9 comments

Vertigo #$FF

@alcinnz @millihertz That's what synchronous RAM is. Access times are still around the 14 to 20 MIPS range (so, figure, 70 to 50ns), so to go faster than that, an SDRAM will fetch a huge word of memory at once, and then parcel it out to the bus one smaller word at a time. A burst size of 8 to 16 isn't atypical these days, so with a 64-bit data path, you're looking at a (64*16)=1024-bit word inside the RAM chip itself.

And, yes, if you can arrange your data to be serialized in such a manner, you can get native throughputs without the need for caches. The problem is that, unless you deal almost exclusively with vectors, it is almost never the case that you can stream data for that long. Consider that research into compilers shows that nearly all programs have a "basic block" size of 8. Meaning, the computer will run at most up to 8 instructions before it needs to process a conditional or unconditional branch.

This is why loop unrolling and function in lining are so significant an optimization, and is a contributing factor why code tends to get larger over time, even for the same input source listing.

BTW,, when I set my Kestrel Projects CPU and main bus speed from 16 to 25 MHz, it was largely due to the access speed of external memory. Going faster all but requires a cache, and those in turn are optimized for synchronous memory.

Expand text...

4 July at 2:04 | Open on hackers.town

Adrian Cochrane

@vertigo @millihertz In my case... I was discussing a "string-centric" system primarily for decoding HTML, audio, images, video, HTTP, etc for display.

Though it probably helped that I described a hypothetical where I was rewriting everything!

4 July at 2:13 | Open on floss.social

Adrian Cochrane

@vertigo @millihertz Regarding that average basic-block size I had an interesting at-least-to-me solution for this usecase of parsing (which probably brings the average down), though I'm not sure how well it generalises.

What if we split the processor in 2 so half executes machine code that's near-entirely branches, thus relying mainly code density? And the other half primarily deals in straight-line code?

I saw a parser generator which included a tight-loop interpreter for such a machine.

4 July at 2:26 | Open on floss.social

Adrian Cochrane

@vertigo @millihertz In otherwords: Yes, my hypothetical did rely on code-cache.

Even if I toyed with an alternate way of handling it!

4 July at 2:38 | Open on floss.social

Vertigo #$FF

@alcinnz @millihertz It's funny that you mentioned that. I have on several occasions exactly this, but we can actually generalize this. We can have one processor whose job it is to control computations from various "thread units". Each thread unit processes instructions in a straight ahead manner for as long as it can. The control processor then serves as a job coordinator. Performance can be enhanced by throwing more straight ahead thread units into the mix.

If the control program tries to launch more threads than or available in hardware, it'll block until a thread is completed its task. In this way, the control processor itself always appears to be single threaded.

Expand text...

4 July at 3:30 | Open on hackers.town

leah & asm & forth, oh my!

@vertigo @alcinnz there's also the problem that even if we do solve the memory speed problem, there's still the problem of signals only going so fast across a circuit board before they end up getting out of sync, corrupted by noise, etc. it'd be far better to have static RAM and a little processor on the same chip, where they could keep up with each other - but that limits the size of RAM (and also the complexity of the processor, but that's a good thing). it also means that adding more RAM would add more processing power... which could only be used if your system were sufficiently parallel to just start doing that already

Expand text...

4 July at 2:39 | Open on oldbytes.space

leah & asm & forth, oh my!

@vertigo @alcinnz basically, the transputer is the biggest missed boat in the history of computing

4 July at 2:40 | Open on oldbytes.space

Adrian Cochrane

@millihertz @vertigo What I can say is: I enjoyed thoroughly thinking through reengineering an app from the hardware-on-up, & found it quite educational to write! I'm getting the impression this imagination could be quite valuable to the future of computing!

I'm keen to do so again, & would love to see others' takes!

That said I don't consider myself a hardware designer...

4 July at 2:51 | Open on floss.social

Vertigo #$FF

@millihertz @alcinnz It is unfortunate that fabrication of logic and of RAM on the same die is exceptionally expensive to do.

But now that we've moved into the era of "chiplets", maybe we should revisit this architecture.

4 July at 3:32 | Open on hackers.town