Something I really like about this tool is that the entire thing is 226 lines of combined HTML, CSS and JavaScript (plus the PDF.js and Tesseract.js dependencies, loaded from a CDN)
The code is a little untidy but at 226 lines it honestly doesn't matter https://github.com/simonw/tools/blob/9fb049424f4ec8f8ffb91a59ab7111cad56088fc/ocr.html
Also neat is that the enabling libraries here - Tesseract.js and PDF.js - are both pretty old at this point:
First commit to Tesseract.js was Jun 26, 2015 https://github.com/naptha/tesseract.js/commit/906ce3cadbffaf5f7317a4418f282c4b78bf8385
First to PDF.js was Apr 25, 2011 https://github.com/mozilla/pdf.js/commit/6dc1770bba7a417ce5664c0305469e5bb7ea76bd