Email or username:

Password:

Forgot your password?
Eugen Rochko

Our search index over here got out of sync a few weeks ago. It was the first time I've needed to re-import data into it in a long time. The command estimated a runtime of over 30 days to complete. So I've been working on optimizing the command.

16 comments
bengo

@Gargron what operates the search index? Would a distributed cluster of elasticsearch (or similar) nodes take that long?

Eugen Rochko

@bengo It's a part of Mastodon's command line utility that has to iterate over all data in the database and submit it to Elasticsearch.

bengo

@Gargron mmmm cool. Maybe GNU parallel could help. Thank for explaining to me though. Love this app :)

G 🇮🇹

@Gargron 30 days... more or less what's needed for an iOS update 😄

Mackaj

@gargron Ouch. If that's a synchronous command better run it inside tmux or screen, just in case.

jeffeb3

@Gargron it's a race! Will the command finish first, or the optimized command?

Ben Zanin

@Gargron oh I was wondering about that!! Thanks for the heads up, and for the maintenance work.

Eugen Rochko

Looks like I brought it down to 17 hours for 58,380,465 posts.

Mx Autumn :elephpant:

@Gargron nice. Always a good feeling when code is made more performant, but an improvement as big as this is cloud nine territory.

Human2022

@Gargron That's a lot of data that's stopped to Facebook/Twitter 😎

Stefan Rother-Stübs

@Gargron wow. Could you provide more info (maybe an article)?

Dawn Tåke 🏳️‍⚧️

*Keeps posting a bunch of crazy to lengthen the process.*

Seriously though, awesome job!

Mitch Spradlin

@Gargron I think I’ve heard that switching to bulk ES operations from single-item operations can provide a dramatic speedup, is that part of what you did?

Go Up