Email or username:

Password:

Forgot your password?
Top-level
dansup

After some digging in, I was able to debug the issue and work on a fix thanks to ChatGPT

Seriously, ChatGPT is a life saver, I'm not an ops person but I was able to use the tips it provided to diagnose the issue, and after learning a bit about GRUB and shit I was able to fix it properly.

I'll be paying more attention to kernel updates and implementing a new update procedure + adding another standby app server to prevent this in the future.

Running prod services are fun, until they aren't

5 comments
Aprazeth

@dansup Amazing stuff, thank you for sharing the "behind the scenes".

Perhaps I am missing a piece of information (and that's on me) but do you perhaps a staging or test server/environment setup? As in, a separate server/instance that gets OS/docker/whatever underlying system updates first, prior to it going on the live one?

If not, it might be something to look into. Having a staging environment can help catch these kinds of things. That said, it will cost some time and thus money :-/

Aprazeth

@dansup alternatively the standby server is also a pretty good idea (but keeping them in sync in terms of everything else par data can be a handful) You'd also have to pick a time after the changes were applied on the live server that they also go on the standby server. (Say 1 or 2 days)

That all said, I'm just some rando on the internet and you're the one in the trenches there fixing stuff. Hope my ramblings might be of some use, if not, I still appreciate your time and the openness :)

Radieschen

@aprazeth @dansup not having a staging environment also costs time and money.

Aprazeth

@radieschen @dansup absolutely, but having worked in environments/organisations where time/money/resources are tight it can be a decision that unfortunately is made.

Which is why I'd rather mention as many options as possible, so the solution for the situation/budget can be made. Designing the ultimate solution is far simpler when you don't have those restrictions but "we gotta do the best we can with what we have"

Go Up