Email or username:

Password:

Forgot your password?
kepano

Microsoft just released a tool that lets you convert Office files to Markdown. Never thought I'd see the day.

Google also added Markdown export to Google Docs a few months ago.

github.com/microsoft/markitdown

24 comments
David Delgado Vendrell

@kepano Why is that on Microsoft Github? Should not be as default export function?

kaiserkiwi :kiwibird:

@daviddelven @kepano It should, but it should also be a tool. I don't have Office. But now I can convert and read Word documents.

nickchomey

@kepano Very cool. Might this get integrated into Obsidian somehow?

Rian

@kepano If only @gruber got royalties, he’d own the Yankees by now

Brett Edmond Carlock

@kepano

Need something like this to liberate my OneNote, but without having to have a license and active install of Office.

@zachleat

The Observations

@kepano quickly looked at the source and from what I can see it mostly leaves the actual conversion to other python packages like mammoth and markdownify, so curious how it compares to pandoc.

bob.php :veritrek_gold:

@kepano they must have needed it for recall before shoving it in the unencrypted sqlite database lmao

Rastal

@kepano You can also use Pandoc to convert MS office docs to Markdown / Github flavoured markdown.

gaProgMan

@kepano Lots of interesting "thoughts" here as to why Microsoft did this.
The official reason is for folks who want to build RAG (retrieval augmented generation) systems. You can't (easily) tokenise MS Office formatted documents because they are (effectively) binary files. But since Markdown is plain text, it can be tokenised super trivially.
And I'm pretty sure that a lot of Microsoft's own docs will be Office formatted. Meankng that any internal RAG projects they're building will need this.

gaProgMan

@kepano Then again, I don't work for Microsoft. So everything I've said can be taken with a whole mountain of salt.

xbezdick

@kepano ai training? what else would motivate them?

Bill Bennett

@kepano Is that something that a non-coder can use on a PC or Mac? If so, where could I find instructions?

Nabil-Fareed Alikhan

@kepano looks like a great tool developed for the wrong reasons (Ai nonsense,)

mirek kratochvil

@kepano did anyone compare the output of this to pandoc? :D

fx dechaume-moncharmont

@kepano Pandoc is already doing an excellent job with that. This new move of MS toward Markdown format looks like another attempt of "Embrace, extend and extinguish".

Alexander K‮kn‭li

@fxdm @kepano I think Occam’s Razor points at LLMs instead.

peterfr

@kepano yeah, but who wants two consecutive paragraphs to be in the same font?

arrbee

@kepano
Next year's announcement.. MS Markdown

pixelschubsi

@rogerb @kepano It's already here. They call it GitHub-flavoured markdown - which is only half the story because it allows for extensions, some of which are exclusively available at GitHub, so you see `README.md` files that can only be displayed correctly when on GitHub. GitHub is embrace, extend and extinguish on Git, Markdown and - to some degree - open source development as a whole.

Go Up