next step is to optimize the interface speed. right now it takes 25us to read a sector from the SD card but ~5 milliseconds (ouch) to DMA it to the PC!
it's mostly an issue with the Teensy-to-FPGA interface, which is async and simple: 4 address lines, 16 data lines, a read control line, and a write control line. everything else is done as a register in the 4-bit address space. flag register for status and mailbox sync bits.
why is it 5ms per sector for DMA? well, it's about 20us per word. most of that time is wasted by the slow interface between the Teensy and the FPGA. I really should fix that.