My goal is to get you to the same point I’m at, so we can hopefully work on solving the remaining problems together.
First check htop. Are you actually using all your RAM and CPU? If not you’re gonna want to tune it until you are.
Here are some diagnostics:
select * from data_migrations;You should see a “delete_context_objects” with state “2”. That thing is deleting useless context objects from the database. Re-run the query and you can see the number increase. So far 3 million are done on Gleasonator, out of 33 million total objects. They say to expect about 30% of your database to be affected by this, so I’m afraid I have like…. 5 more days of this.
Next check:
select count(*) from oban_jobs;Is the number increasing? Is it ever going down?
I woke up this morning and had 17,000 jobs stuck in the queue. My timeline was showing posts from 8 hours in the past. I finally managed to drain it, here’s how.
Check htop again. I was using only about 40% of my RAM and 30% of my CPU. After changes, I was using 90% of my ram and 100% of my CPU, and the queue cleared.
To do this I cranked up postgres max_connections to 400 and work_mem to 16MB. It took some guess and check.
I then cranked up the pool_size in my Ecto config to 280:
config :pleroma, Pleroma.Repo, pool_size: 280, timeout: 10_000At first I set the timeout to 300_000 (5 minutes) but I think that actually made it worse. Turning it down helped.
Now finally increase the number of workers in the federator_incoming queue:
config :pleroma, Oban, queues: [ federator_incoming: 150 ]After all this, my logs are still full of errors. The background migration is still running. But my site is mostly performing fine, except that certain endpoints are returning 500 errors. I think they’re using the wrong index or something. In particular /api/v1/relationships is frequently having issues.
I forget if upstream pleroma ever got it, but also see if you have /phoenix/live_dashboard on your server. If so, go under “Ecto Stats > Long running queries”. I’m seeing a lot of this:
SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at" FROM "activities" AS a0 WHERE (a0."actor" = $1) AND (associated_object_id((a0."data")) = $2) AND ((a0."data")->>'type' = $3) LIMIT 1That’s the point I’m at.