Email or username:

Password:

Forgot your password?
Simon Willison

Published some notes on Docling, a rather nice MIT licensed Python PDF document / table extraction library from IBM simonwillison.net/2024/Nov/3/d

4 comments
Matt Campbell

@simon How does the Markdown output from Docling compare with the HTML that you've gotten out of Gemini for PDF documents? Does Docling do a good job of recognizing headings, lists, etc.?

Simon Willison

@matt I tried it on two documents so far and it looked reasonable, but I've not done a remotely robust comparison of it yet

Xing Shi Cai

@simon Any comments on it's output's quality?

Simon Willison

@xsc I tried it on two PDDs and it looked OK, which isn't nearly enough testing for me to say anything useful!

Go Up