Email or username:

Password:

Forgot your password?
Tim Hutton

If you download your Twitter archive it arrives wrapped as a static HTML page, which is not very useful for doing anything with, and worse: it requires the original account to be still active to do useful things like enlarge the images since they use t.co links.

So here's a Python script to convert a Twitter archive to markdown or other formats: github.com/timhutton/twitter-a

Now you can archive your tweets in any way you want.

179 comments | Expand all CWs
Terence Eden

@timhutton Thanks - that'll be handy.

Am I reading it wrong though? Looking through my archive, I see the full image URl in the entities?

Tim Hutton

@Edent The archive also contains the full images, though the filenames aren't given in the JSON and have to be constructed as <tweet-id>-<media_url.filename>

My comment about the t.co links was referring to the behavior of the HTML page they give you, which takes you to a t.co link when you click on an image to see the uncropped version.

Pepijn

@timhutton Thanks. Do you known if someone is hosting this (or something like it) in an accessible way?

Tim Hutton

@Pepijn I don't, sorry. I'm not sure it would work well as a webpage service, because it would need large data transfers.

If you just want a way to view your tweets then open the HTML file in your archive.

Pepijn

@timhutton No problem. I have my own tweets and images. Was more wondering about some others I know who would appreciate an online tool.

Adnan 🦙

@timhutton @aallan Some way want to import it into Mastodon

Tim Hutton

@adnan @aallan Seems like this isn't supported on the Mastodon side, for performance reasons: github.com/mastodon/mastodon/i

Adnan 🦙

@timhutton @aallan But you can export your archive of posts at any time, can’t you? I am asking for the possibility of importing this archive of posts into a brand new instance. I never tried it but spacebear.ee offer Pleroma and Mastodon hosting plans and they say you should contact support for importing data from another instance. I never tried it myself but I wonder what they actually do.

Tim Hutton

@adnan @aallan Yes, it must be possible to import toots if you are a Mastodon admin. Let me know if you find out how to do it.

irgendlink

@Bahnhofsoma @timhutton thank you. I'll try it out. Especially the shortlink decode function seems to be very useful.

Andrew Jon Thomson

@timhutton thank you so much for doing this unfortunately I thought I was very technically advanced 30 year Internet user who knows how to code basic html but since I’ve come to mastodon I realized I’m a complete ignorant noob who doesn’t understand computers.

Can you explain the following to me which I don’t understand it’s like it’s a whole new language I’ve never learned before?:
“Open a command-prompt in that folder
python parser.py”

Doug Holton

@HelloAndrew @timhutton here's some info on installing and running python scripts in Windows of it's helpful learn.microsoft.com/en-us/wind

Andrew Jon Thomson

@dougholton @timhutton awesome, thanks!
Do you know if this is something that can be done in mobile operating systems as well?
Or if not mobile then on macOS? I don’t have a Microsoft desktop machine available.

Armin Hanisch

@timhutton @dougholton @dorward @HelloAndrew
Starting with macOS 12.3, there is no Python installed anymore. You have to install Python 3.x yourself. Before that, Apple only came with the old Python 2.7 which is not able to run the script above as it uses Python 3 language features.

David Dorward

@HelloAndrew Command Prompt

A command prompt is a way of interacting with a computer by typing (and opposed to pointing and clicking).

(More specifically it is the bit where you type commands in a Shell which is displayed by a Terminal).

There are different shells which have various differences but the basics of them are mostly the same. Bash and Zsh are common shells.

How you open one depends on your operating system.

David Dorward

@HelloAndrew On a Mac I use Spotlight and type Terminal. On Linux flavours there will usually by some Terminal app in menu although it might be a little deep by default). On Windows you can use the search feature of the Start menu and look for PowerShell (which has its own flavour of shell).

In That Directory

At the command prompt type “cd” (change directory) followed by a directory name to go down into that directory. Use “..” as the directory name to go up a level.

David Dorward

@HelloAndrew You can also use ls (list) (or sometimes dir (directory)) to list the files in the current directory.

python parser.py

This is the command to run python (a program that executes scripts written in the Python programming language) followed by the name of the script (which will be in the directory you cded into). You will need to install Python to use it.

missing.csail.mit.edu/2020/cou might also be useful.

Hope this helps.

Andrew Jon Thomson

@dorward thank you so much for spending so much time trying to help me understand this I think if I could find the time to go through everything you’re saying it would be pretty straightforward. my biggest disappointment with my experience in Mastodon so far has been that the community seems way more technically advanced than I am and I thought I was a very advanced computer user but apparently I’m not. maybe I’m too old to learn the stuff I need to wait for it to be turned into easy websites

David Dorward

@HelloAndrew You’re welcome, it’s helping pass my commute. I’m probably providing more information than you need (I learn better with more context) which might be a bit overwhelming. Don’t let that discourage you. You know your way around a computer, this is just typing your way around instead of pointing.

The minimum to get up and running is:

1 Open the Terminal
2 Type “cd” then drag the folder on top of the terminal window (to enter its name) then press return
3 Run the Python command

@HelloAndrew You’re welcome, it’s helping pass my commute. I’m probably providing more information than you need (I learn better with more context) which might be a bit overwhelming. Don’t let that discourage you. You know your way around a computer, this is just typing your way around instead of pointing.

Tim Böttcher

@HelloAndrew @dorward All that needed to be said in technical terms already got said.

Let me just add that this is a universal experience. I think I'm a programming language polyglot who knows ~10 programming languages well; I'm a confident sysadmin; and yet I often have to fight a sense of impostor syndrome: I feel like I'm just cobbling together things other people have built and those other people are the real magicians.

Andrew Jon Thomson

@Tim_Boettcher @dorward i’m really surprised how much this is affecting me emotionally but for the first time in 40 years I feel like a future shock like I’m too old to learn the basics of the world I live in all of a sudden. It’s really quite emotionally upsetting. So I’m very grateful for people trying to help.

Andrew Jon Thomson

@Tim_Boettcher @dorward we all know Stephen Fry is no technophobe but even he expressed a similar feeling on his first day here overwhelmed by technical aspects that he did not understand.

David Dorward

@HelloAndrew @Tim_Boettcher Much truth here. The part of my team’s project which needs my specialty isn’t quite ready to have work started on it so I’m working on a different part of the project and have been feeling lost in a forest of new tools that I don’t understand. It’s starting to come together now though.

Andrew Jon Thomson

@dorward like I’m somewhat familiar with the terminal window in macOS but installing python now it seems like this is a project I don’t have time to dig into that I just have to hope and pray someone turns it into a website where I can just enter my credentials and it’ll do the work for me.

Rob

@timhutton Ah, interesting. I have some old archives which are .json. I wonder when they changed. Haven't yet looked at one I downloaded some time back. I wanted to mark old tweets in the .json using a Python script and automate deletion, but I wasn't convinced there was an easy way to do this.

Tim Hutton

@pocketapocketa Possible this will help: github.com/selfawaresoup/twitt

cc @selfawaresoup

I think the deletion can be done through the Twitter API but I haven't tried it.

Esther

@timhutton @pocketapocketa it can be done using only the api but due to their heavy rate limiting, it’s much faster to do it using the archive

Rob

@timhutton @selfawaresoup Ah, just took a glance over it. Looks like just the tool. Thanks!

Mathijs Booden

@timhutton Step 4 (python parser.py) is magic to me. What do I really do?

Steven Lawson Photography

@timhutton This sounds like it will be really useful but alas way beyond the ken of techno numpties like me 😂

Tim Hutton

@stevenlawson If you just want to see your tweets, the archive they provide does come with a very user-friendly HTML file that you can open.

Peter Mount

@timhutton @f4grx Thanks for that, it'll be useful.

Just raised github.com/timhutton/twitter-a as tweets.js gets split into multiple 101Mb parts if your archive is large

Tommi 🤯

THANK YOU SO MUCH FOR SHARING THIS, @timhutton!

I had my Twitter export folder lying in my hard disk for so long that I almost ended up deleting 5 years of my activity on the platform just because it was so itchy to find the stuff I wanted.

I now have a solution, thanks again ❤️

Esther

@timhutton you might also be interested in this. It’ll extract threads from a Twitter archive, among other things. github.com/selfawaresoup/twitt

Dave W. :unverified:

@timhutton

Using pleroma-bot you can suck that twitter archive.zip directly into mastodon, and have them present as toots.

Tim Hutton

@Daviey Good to know, thanks! I'm guessing that requires you to be a Pleroma/Mastodon admin?

Dave W. :unverified:

@timhutton

I don't think so, just pull out an API token from your Mastodon profile.

I started doing it via Twitter API's, until I realised that the API doesn't go back far enough. Then when I saw you could put a `archive.zip` directly in, I'm waiting for that to be prepared.

Markus Eicher

@timhutton Hello Tim. Thank you very much for posting this tip. Appreciate it!

Patrick Vanhoucke (no parody)

@timhutton “Open a command-prompt” > I'm not good at that, unfortunately. Isn't there just a piece of installable software for that? Preferably for macOS.

chicagonz

@timhutton
Thanks for the code, curious if that includes any bookmarked items so will have to check it out.

Tim Hutton

@chicagonz The Twitter archive contains all sorts of things like your 'lists' and 'moments'. My script doesn't try to parse any of that but you could likely extend it to do so.

chicagonz

@timhutton

Excellent, can dust off the Python coding and probably use Beautiful Soup or something. Thanks!

JJ Celery

@timhutton ah brilliant, I was gonna write one myself, you saved me lots of time 👏

Penfriend (Laura Kidd)

@timhutton thank you so much for this! I've saved it to puzzle through another day :)

David Dorward

@timhutton That does look great. I’m doing some updates to my blog backend when I have the time, but once those are done I’m definitely going to take a look at that script to export some threads into it.

Tim Hutton

@dorward Just be aware that the twitter archive doesn't contain entire threads, just your tweets and your replies (and DMs and some other things). I'm not sure that there is a way to extract all threads, even through the Twitter API.

David Dorward

@timhutton I meant thread in the sense of “replying to self multiple times because of size limits” rather than whole conversations so that shouldn’t be a barrier.

Thomas Fricke (he/him)

@timhutton

Many thanks. Can we turn this into #mastodon format and have a history instance somewhere?

We could create f.e. a deadbluebird.social instance and everybody could upload their data for historical research.

@fuzzyleapfrog@gnusocial.de @Lambo could be interested

Tim Hutton

@thomasfricke @Lambo I'm told that 'pleroma-bot' can achieve this.

For historians, archive.org does seem to have public tweets archived.

Space Hobo Actual

@timhutton Nice. Almost a million lines of markdown, 55MB in total:

903892 5965393 54709985 output.md

Tim Hutton

@spacehobo Good to know, thanks! I only tested on my archive which makes 6000 lines of markdown.

(I can only imagine how the Mastodon devs feel right now watching their Ruby code handle the millions of new visitors.)

Andy Warburton ❌❌❌

@timhutton thanks for this. I literally JUST this minute downloaded my archive!

Tim Hutton

@SimonRoyHughes Oh wow. You could try extracting only data/tweet*.js and then maybe data/tweet_media/* for the images?

SimonRoyHughes 🇬🇧🇳🇴🇺🇦🏳️‍⚧️🏳️‍🌈

@timhutton Licky for me, I have copies of all of the images I’ve posted to Twitter. I think I may just delete the archive. What I don’t remember never happened.

Nathan Pitman

@timhutton it would be lovely if someone could build a tool to import tweets as historic toots so we could pick up our data and drop it into an account here.

emilyriederer

@timhutton Great callout on account still needing to be active! Downloaded my archive but hadn't had a change to look yet. Thank you for this!

helix2301

@timhutton that's awesome tool written in python which is great I am not sure I care enough though about my old tweets to go through the effort lol

⁨⁨index of /shitposts :verified_kirby:⁩

this might also interest you: i made a web app a while ago that lets you query tweets from the archive and also provides a script (under "tweet utils") that lets you bulk delete the queried tweets! sk22.github.io/twitter-archive

Ed Summers

@timhutton I know you didn't ask me, but I actually think Twitter's archive is one thing that they did a rather nice, thoughtful job on. So many platform archives are just static dumps of JSON (sometimes without media files at all) that only technical people can use. That being said, thanks for your tool!

Tim Hutton

@edsu I agree. For non-technical users the HTML is an excellent way to browse through the archive - other than the fact that expanding images doesn't work (it takes you back to Twitter).

64kb

@timhutton that's pretty handy; I noticed the HTML blob before and it isn't super useful, it would be nice to be able to run a local micro-twitter thingy to search for content etc (I guess it could even be a dockerised version of mastodon) - I have a similar thing for my old wordpress blog, which I can defreeze at will.

Andy Warburton ❌❌❌

@timhutton ha ha, I know the feeling, whenever I share my code and someone else uses it and it works, it blows my mind!

Stu

@timhutton I have been waiting for this!!!!

Stu

@timhutton any possibility to get the data converted into JSON format comparable to the normal Twitter standard?

(ง'̀-'́)ง TimmGleason🍖🍖

@timhutton Thank you! I grabbed my archive and opened it up and half of it just took me back to the birdsite.

ARCHIVE:

Femme Malheureuse

@timhutton Thank you so much for that! I was afraid I'd have to learn how to code to crack my archive open.

Ponder Stibbons 🇧🇷🇩🇪

@timhutton

For saving bookmarks this is extremely useful:

github.com/jarulsamy/Twitter-A

but unfortunately it is not usable for ordinary end users.

maraleia

@timhutton when I downloaded my archive the way twitter wanted me to stopped showing any results after awhile. Any suggestions?

Tim Hutton

@maraleia Keep trying. I suspect millions of people are trying to download their archives at the moment...

Chris [list of emoji]

@timhutton

Relatedly: there's a fork of Twint (command-line Twitter scraper) that continues to be maintained:

github.com/woluxwolu/twint

It will retrieve public timelines without you needing to be logged in.

Wikinaut

@timhutton Didn't try yet but extremely important, thank you!

Unfortunately, there is currently no way to download one's bookmarks - and these are not comprised in the archive - the biggest problem.

Please let me know, if you found a kind of scraper (nodejs solution?) which which one can locally download the bookmark (urls/status ids would be sufficient to feed your script with those, I suppose.

Diane Bruce

@timhutton Dang I was thinking about writing something to pull these apart. You just saved me the trouble! Thanks (Normal geekery as @DianeBruce )

Chris Taylor

@timhutton thanks for doing this, really useful. Might be worth noting that tweets are *not* in date order in the output Markdown file. That confused me as I thought I'd lost 10 years worth of birdsite drivel! 😄

Tobias

@timhutton
This is really great, thanks a lot! I wanted to write something like that on my own in future, not needed anymore. 🙂🥳

Fabio! 🐈

@timhutton It's great to know this exists! I was going to start a project to do exactly that but the archive format is not very developer-friendly to say the least.

Fabian ¯\_(ツ)_/¯

@timhutton is this even legal on Twitters behalf? From how I understand the GDPR, it must be possible to actually *download* the data in a usable and portable format, not just getting a bunch of links to their servers.

DELETED

@timhutton thankfully, I am not that important. But glad there’s something for people who are. 😁

Lari Lohikoski

@timhutton Has anybody been able to download the Twitter archive in recent days? My downloads of my 450M archive are always stopped at about 50-100M. I'm starting to suspect that this is intentional..

Michael T Babcock

@timhutton very nice, much appreciated as I'd considered doing something similar to import some of my old twitter feed. I didn't notice in my quick glance but are you expanding links for replies to the original messages as well? That would be handy.

nadin brzezinski

@timhutton @sopyer I will keep it static. With all the errors stacking…

Jenni

@timhutton I'm not a super tech and it worked for me. The only things I had to do not in your instructions were: download python from the Microsoft web store (because I don't use it generally, and when you enter "python" in the command prompt it will prompt you to download) and
2. drop the script from GitHub into the root folder of the unzipped Twitter archive.

jaclyn

@timhutton does the python script also require the original twitter account to be active to get the images from the t.co links? or does it work for archives of accounts that have been deleted?

@fascinatorfun@mastodon.green

@timhutton

Just got notification of my archive zip file and guessed it would be in a format pretty impossible to use with ease..

If I download it as is in a zip file to my iPad , would I then be able to run the script against it. And do I need particular devices and software? I’ve only got iPad? But have created zillions of Twitter threads, some of which took days of work.

Or would I need to pay someone to convert it for me?…and then save the file that way?

James Brace

@timhutton I’ve only used mine to CircleBoom my digital footprint

tenor.com/VDto.gif

anlomedad

@timhutton
Awesome! Was hoping for something like your script.
Very sad that you can't re-construct threads, tho.

But replacing tiny.URL with originals is so good, thank you very much!

But... it doesn't work. Have put parser.py into the unzipped archive folder and called from terminal. Output:

python parser.py
File "parser.py", line 8
<!DOCTYPE html>
^
SyntaxError: invalid syntax

What am I doing wrong?

@timhutton
Awesome! Was hoping for something like your script.
Very sad that you can't re-construct threads, tho.

But replacing tiny.URL with originals is so good, thank you very much!

But... it doesn't work. Have put parser.py into the unzipped archive folder and called from terminal. Output:

python parser.py
File "parser.py", line 8
<!DOCTYPE html>
^
SyntaxError: invalid syntax

Woody :breadified:

@timhutton Could you cross-post that on Twitter please? Would be useful for them especially.

Deadly Headshot

@timhutton Improved it to work as a Markdown input to Pandoc: pastebin.com/NiFUrHgU
Pandoc command: `pandoc --from markdown_github --to html5 --standalone output2.md --output output2.html --toc`

MateRhyu

@timhutton
So i suppose thus is no more possible if I've deleted my account already right? :/ I have the archive though...

Hannah Kolbeck

@timhutton They also don't include alt text for images. I made a site that will take your archive (or just the tweets.js inside it) and scrape the Twitter API for alt text. It's been lightly tested, but I think it's about ready for prime time: archive.alt-text.org

The result is a big json blob, if you felt up to including an option to fold it in, that would be amazing.

Chris Levesque

@timhutton I miss when Twitter just gave us a .csv file.

Emelia 👸🏻

@timhutton probably by end of this weekend I'll have a script to convert to activitypub

Aurin Azadî

@timhutton I just downloaded it and gave it my Twitter archive. I found that in the beginning (2008-12) I made (accidentially?) use of tinyurl.com links, so that the original link is not included in the archive. I'm going to try to write a module that requests the original links from tinyurl and replaces them in the output.md.

And: Do you plan to make it possible to write the data into a database? That would be a good start for a flexible way to republish ones own tweets on a website. If not, I'll try that, too, but I don't know whether I will really find the time.

@timhutton I just downloaded it and gave it my Twitter archive. I found that in the beginning (2008-12) I made (accidentially?) use of tinyurl.com links, so that the original link is not included in the archive. I'm going to try to write a module that requests the original links from tinyurl and replaces them in the output.md.

Go Up