Lemmy.world had to shut down the front page and put up a message about the load and a graph. They seem to chalk it down to the nature of social media sites to attract attacks.
I’d hack up the Rust code to have self-awareness of concurrency with PostgreSQL and return a new busy error.
Federation connections, RSS feed, API - and any other method that is hitting the database needs to have a concurrency count in the Rust code and an error message system for busy.
I’d probably build a a class to help with this and once concurrency for an API is over 5 mark the high water with a timestamp and start doing logic based on elapsed time. If > 5 and elapsed time exceeds a threshold (say 1 minute), then return the busy error.
is Prometheus the right way to expose these numbers for operators wanting to know about the thresholds.? I’d probably add a dedicated log file to track concurrency thresholds and busy errors.
the front-end apps also need to be caching “Trending communities”, I think lemmy-ui is still pulling that live from PostgreSQL for every refresh of the page. I need to check if anyone has added that.