Week long downtime

Nerd02@lemmy.basedcount.com · edit-2 8 months ago

Week long downtime

zeazide@lemmy.basedcount.com · 8 months ago

Glad to see it back up, I was wondering if something was wrong with my computer

Happy Thanksgiving (U.S.) belatedly to posters

tohuw@lemmy.basedcount.com · 7 months ago

Way to finally make the thing do the thing, finally. Or whatever.

Last@lemmy.world · 8 months ago

deleted by creator

Nerd02@lemmy.basedcount.com · edit-2 8 months ago

post the reason here

I literally wrote it up there:

The TL;DR is that the instance ran out of disk space, so the database crashed. No database, no Lemmy.

If you care about some more details you can read what I wrote on Discord. If you don’t carry on. But whatever, for the sake of completeness here’s the wall of text.

Context: the following was written on 26/11/2023 18:23:52 UTC, before I started working on the problem. The migration to the S3 took about 4 days.

What the hell is going on?

The instance is currently offline because we ran out of space. You lot shitposted too much and filled the database. Expanding the storage isn’t trivial and is significantly expensive, so we are using this outage to do some much needed maintenance and make the instance more affordable for us to run in the future, as well as more resilient to similar kinds of issues.

Yeah very cool but what does it mean? In English?

The problem

Right now the instance is lives on two drives: a 15GB one containing the database and a 30GB one containing images uploaded by other users or synced from other instances. The former filled up, so the Lemmy backend is crashing on startup. The server is now stuck in a loop of starting, realizing it can’t write to the database, crashing and restarting. The only way to exit this loop is to either spend money and buy extra space or to delete some old data.

Our solution

The more intuitive solution would be expanding the database drive, however that’s just too expensive and not really sustainable in the long run. Instead, we are going to transfer the image host to a significantly cheaper host, free up the 30GB drive and move the database there. This solution is, however, a bit more complicated and takes some time.

If you speak computer shit, what I just said means we are going to move the image host (pictrs) to a remote S3 object store, right now it’s located in the same filesystem as the Lemmy server.

Week long downtime

Week long downtime

The new host

Next moves

What the hell is going on?

Yeah very cool but what does it mean? In English?

The problem

Our solution