@TranslateScience @xldrkp i saw ocrmypdf: https://ocrmypdf.readthedocs.io/en/latest/index.html in my timeline, it uses the Tesseract framework and then puts the OCR back in the PDF.
Top-level
@TranslateScience @xldrkp i saw ocrmypdf: https://ocrmypdf.readthedocs.io/en/latest/index.html in my timeline, it uses the Tesseract framework and then puts the OCR back in the PDF. 2 comments
@ashwinvis @amael @TranslateScience Thanks for that hint, didn't know about this integration. |
@amael
Tesseract works reasonably well with text, but not so much with equations, if I recall. It would need some manual cleanup.
Zotero-OCR plugin integrates with Tesseract.
@TranslateScience @xldrkp