@kcoyle @internetarchive That's why I prefer the scanned...

internetarchive's posts Post Back to profile

@kcoyle @internetarchive
That's why I prefer the scanned image to OCR'd test. (I had a job, for a while, fixing OCR'd text in insurance laws. Some of the output was actually funny, like "legal obligation" becoming "lethal obligation" and "District" getting scrambled in to "Omelet".)

Like 20 June at 17:19 | Wall-to-wall | Open on mastodon.social

2 comments

karen coyle

@PJ_Evans @internetarchive The text can be really garbled. Note that it is offered to the visually impaired and I would like to hear how well it is working for them. When I look at it, it's a mess for things like tables of contents; plain text renders better, but there are still errors. Could something be set up where the text could be corrected by humans?

20 June at 17:27 | Open on mstdn.social

P J Evans

@kcoyle @internetarchive
That's Project Gutenberg.

20 June at 17:29 | Open on mastodon.social

Go Up