now I'm reading up on accessing SD cards from the Teensy 4.1. looks like SdFat is the library? could it be so easy?
Top-level
now I'm reading up on accessing SD cards from the Teensy 4.1. looks like SdFat is the library? could it be so easy? 82 comments
turns out it's easy but I had to reformat the SD card using the official sdcard.org utility. anyway, I've read the first sector from a real disk image! the BIOS runs faster than the DIFDIAG utility, and so it seems like it is hitting a timing problem that i didn't hit before. my drive code seems to randomly hang up and not respond correctly. it's occasionally getting a spurious end-of-interrupt command which is really odd and points to an issue with the mailboxes (again, sigh). but it's SO DARN CLOSE. it's transferring sectors from the IML region in the disk image. @tubetime What’s the end goal here, boot the laptop using a SD card that’s emulating a Micro Channel hard drive interface, via Teensy? @tubetime I know it’s not the point of your project, but didn’t this laptop have PCMCIA? Would one of these work as a boot drive? Digigear SD SDHC SDXC to PCMCIA PC Card, Adapter Supports, ATA Flash Memory https://a.co/d/agF7LO9 figured out one problem. the disk boot routines slam the drive with an ATN and the first command word in 5.5us. the Teensy code takes too long to see the ATN and clears the command register full flag, which drops the first word. oops. so it *almost* boots now. in fact it successfully loads the IML sectors from the hidden partition on the drive, and no longer throws an I999... error code! my drive doesn't implement this weird feature called pseudo RBAs--it's a way to artificially limit the maximum possible block address, presumably so they can hide the partition data. i suspect the BIOS checks this, so i'll have to implement it. ugh. that means i need to figure out this incomprehensible diagram. well, it's working well enough to run qbasic. right now the drive is read-only. i think i need to dig into the 01290200 cache error that has been coming up. i'm concerned that an issue with my DBA-ESDI card has caused it, but i'm not sure. looks like the cache is inside the CPU. i can't find any cache chips on the motherboard. see? no cache or memory chips. the larger devices are probably semicustom gate array parts that IBM was fond of using. doubt they contain any cache memory. @tubetime looks like the error is generated by an NMI that gets tripped when the cache is being set up. could be a number of causes but in general it is an issue with the internal CPU cache. could also be this test of the DMA controller which is also included in the same set of tests and triggers the same error code, for some reason. hmm, the error still comes up. so i just tried what i *should have tried* at the start -- the 700 series diagnostic disk. when the diagnostic detects the cache error, it asks if you have replaced the CPU card. i *lied to it* and said that I had, so when it asked if i wanted to keep the cache disabled, i said "N". aaaand that fixed it! we're now booting to DOS off my DBA-ESDI disk replacement. so here's what i think happened: this is all very good because i know the root cause and it's not something terrible like data bus contention, and it's thankfully not permanent damage. it boots windows 3.1 now. it was trying to run a weird hdd power saving mode command I hadn't implemented. it also complains about the swap file because the filesystem is read only still. so about that write issue: it's an off-by-two error somewhere. two bytes being a single 16-bit word, so it's really an off-by-one error. @tubetime of course it is. 🙂 Thinking it was an off by two error was off by one. 😂 figured it out and fixed it. i forget to set the "transfer request" flag to kick off DMA. in another routine, it sees that this flag is clear and assumes that a word has already been read using DMA, so it reads a crap value and then sets the transfer request flag again to start the next DMA transfer. that "crap value" pushes the valid data forward by one word. on to the next issues: randomly the ATN register mailbox flag gets set but the data in it is stale. also, the status interface register will randomly get read from by the host. I think these are two facets of the same problem: the mailbox flags sometimes respond when you access a register that they are not supposed to be monitoring! the mystery deepens. according to the logic analyzer, temp_atn_set never goes high. reg_atn_set (for crossing clock domains) is always 000. flag_atn is only set to 1 on this single line of code! and yet, somehow, it magically flips to a 1. looking at the generated logic, i see no explanation either. temp_atn_set (aka sd_cmd, my test point) never goes high. no glitches, no nothing. to set the flop, EN must be high and R must be low, and a clock edge must occur. there's a glitch! that's why I missed it before, it's only 2ns. this is the signal from the MCA bus clock domain, and it's getting picked up in my other clock domain's edge detector. and i believe this is the cause. this line right here. each signal, la_*, is an output from a flip flop latched by the micro channel bus cmd line. however, this line of code creates some combinational logic--there's a timing hazard here... the problem? the line (la_addr == REG_ATN) creates a bunch of gates that are slightly slower than the simple AND gates in the previous part of the line. so la_mca_op=1, ~la_s0_w_l=1, and (la_addr == REG_ATN) *is also a 1 for a very short time!!!* this is because the previous value of la_addr WAS a REG_ATN. what i need to do is take that entire wire and turn it into a latch (a reg) and clock it on cmd. so here's the solution: all the signals in the MCA bus domain go to a latch clocked in that domain (the first "always" block). then *without any combinational logic* the output of that latch goes *directly* to another latch (the second "always" block) located in the main clock domain. (i have another flip flop in main clock domain just for detecting the edge) next step is to optimize the interface speed. right now it takes 25us to read a sector from the SD card but ~5 milliseconds (ouch) to DMA it to the PC! it's mostly an issue with the Teensy-to-FPGA interface, which is async and simple: 4 address lines, 16 data lines, a read control line, and a write control line. everything else is done as a register in the 4-bit address space. flag register for status and mailbox sync bits. why is it 5ms per sector for DMA? well, it's about 20us per word. most of that time is wasted by the slow interface between the Teensy and the FPGA. I really should fix that. got rid of some delays and now we're down to 6us per word. but there are some unexpected wide gaps in between transfers between the FPGA and the Teensy. @tubetime This was really enjoyable to follow along with! Congrats on the progress @tubetime "No command can not access" so.. uh.. commands can access? 🤯 @tubetime Your project is an ESDI drive emulator, right? (specific drive type, but ESDI interface) Would it work in another computer that had an ESDI controller and understood the IBM drive? (I assume so, but then you mentioned microchannel which confused me -- I'm assuming the "creaky old IBM laptop" interface is ESDI?) @tubetime Your recent posts make me believe you’re a time traveller - perhaps the great-great-grandson of an IBM engineer who as a child found an old notebook complaining about this problem your grandpa just couldn’t solve and that he was fired for and altered the arc of his life. You studied your whole life as an engineer for this moment to come back to today when the hardware was still available, fix the problem, then go back and help your grandpa. Let us know how it turns out. @amart not too far off! my grandfather worked there. i once reverse engineered a prototype floppy drive he worked on, and got it working again. https://twitter.com/TubeTimeUS/status/1617703291483467776 |
@tubetime I've used that library before, yes it really is easy. You probably won't set any throughput records, but it was great for writing diagnostic logs that were later read back and uploaded.