really cool how we've stored the sum of all human knowledge...

seb

really cool how we've stored the sum of all human knowledge in a format that is near impossible to parse

i'm not aware of any wikitext parser in existence that isn't imperfect in some way, they're all just *slightly* divergent as far as i can tell, and wikitext is so context-dependent that without having knowledge of the contents of every template used in the page (and every template used by said templates) you always need to just, guess how it should be interpreted

this isn't even a case of "mediawiki's parser is the only correct parser that really exists", it's a case of "mediawiki doesn't actually have a parser at all, it just incrementally translates wikitext to html, and does so imperfectly at that, since it's possible for valid wikitext (there is no such thing as invalid wikitext) to generate invalid html"

but the "imperfections" in mediawiki are actually just how wikitext works, since there isn't any specification; mediawiki's implementation just *is* the standard

and like, idk, i feel like this is kinda a big problem? that there's no entirely correct way to programmatically work with the format used by the majority of wikis, including the one which happens to be the largest and most up-to-date encyclopedia in the world? that there's no actual specification for interpreting wikitext outside of just trying to copy what mediawiki does?

and like, it's not *terrible*. well, ok, it is, but, like, not entirely. it's possible to make a parser that close enough to what mediawiki does that in practice it works fine; many such parsers exist. but for such a ubiquitous format, i feel like there should be a much higher bar

Like 12 Aug 2023 at 19:00 | Open on jittr.click