@millihertz When I thought through reengineering an OS/browser from the hardware on up what made sense to me:
Have large amount of memory with fast burst-transfers, processing the data in it linearly. So we can still process reasonably large files!
And a smaller amount of memory in which we can arbitrarily rearrange data.
@alcinnz @millihertz That's what synchronous RAM is. Access times are still around the 14 to 20 MIPS range (so, figure, 70 to 50ns), so to go faster than that, an SDRAM will fetch a huge word of memory at once, and then parcel it out to the bus one smaller word at a time. A burst size of 8 to 16 isn't atypical these days, so with a 64-bit data path, you're looking at a (64*16)=1024-bit word inside the RAM chip itself.
And, yes, if you can arrange your data to be serialized in such a manner, you can get native throughputs without the need for caches. The problem is that, unless you deal almost exclusively with vectors, it is almost never the case that you can stream data for that long. Consider that research into compilers shows that nearly all programs have a "basic block" size of 8. Meaning, the computer will run at most up to 8 instructions before it needs to process a conditional or unconditional branch.
This is why loop unrolling and function in lining are so significant an optimization, and is a contributing factor why code tends to get larger over time, even for the same input source listing.
BTW,, when I set my Kestrel Projects CPU and main bus speed from 16 to 25 MHz, it was largely due to the access speed of external memory. Going faster all but requires a cache, and those in turn are optimized for synchronous memory.
@alcinnz @millihertz That's what synchronous RAM is. Access times are still around the 14 to 20 MIPS range (so, figure, 70 to 50ns), so to go faster than that, an SDRAM will fetch a huge word of memory at once, and then parcel it out to the bus one smaller word at a time. A burst size of 8 to 16 isn't atypical these days, so with a 64-bit data path, you're looking at a (64*16)=1024-bit word inside the RAM chip itself.