Conversation
Notices
-
Embed this notice
sjw (sjw@poa.st)'s status on Wednesday, 07-Sep-2022 10:15:36 JST sjw I think this will take a while... -
Embed this notice
Alex Gleason (alex@gleasonator.com)'s status on Wednesday, 07-Sep-2022 10:15:34 JST Alex Gleason If your server shits the bed hit me up, I've been dealing with issues. -
Embed this notice
sjw (sjw@poa.st)'s status on Wednesday, 07-Sep-2022 10:15:35 JST sjw Sorry for the downtime -
Embed this notice
Alex Gleason (alex@gleasonator.com)'s status on Wednesday, 07-Sep-2022 11:33:09 JST Alex Gleason My goal is to get you to the same point I’m at, so we can hopefully work on solving the remaining problems together.
First check htop. Are you actually using all your RAM and CPU? If not you’re gonna want to tune it until you are.
Here are some diagnostics:
select * from data_migrations;You should see a “delete_context_objects” with state “2”. That thing is deleting useless context objects from the database. Re-run the query and you can see the number increase. So far 3 million are done on Gleasonator, out of 33 million total objects. They say to expect about 30% of your database to be affected by this, so I’m afraid I have like…. 5 more days of this.
Next check:
select count(*) from oban_jobs;Is the number increasing? Is it ever going down?
I woke up this morning and had 17,000 jobs stuck in the queue. My timeline was showing posts from 8 hours in the past. I finally managed to drain it, here’s how.
Check htop again. I was using only about 40% of my RAM and 30% of my CPU. After changes, I was using 90% of my ram and 100% of my CPU, and the queue cleared.
To do this I cranked up postgres max_connections to 400 and work_mem to 16MB. It took some guess and check.
I then cranked up the pool_size in my Ecto config to 280:
config :pleroma, Pleroma.Repo, pool_size: 280, timeout: 10_000At first I set the timeout to 300_000 (5 minutes) but I think that actually made it worse. Turning it down helped.
Now finally increase the number of workers in the federator_incoming queue:
config :pleroma, Oban, queues: [ federator_incoming: 150 ]After all this, my logs are still full of errors. The background migration is still running. But my site is mostly performing fine, except that certain endpoints are returning 500 errors. I think they’re using the wrong index or something. In particular /api/v1/relationships is frequently having issues.
I forget if upstream pleroma ever got it, but also see if you have /phoenix/live_dashboard on your server. If so, go under “Ecto Stats > Long running queries”. I’m seeing a lot of this:
SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at" FROM "activities" AS a0 WHERE (a0."actor" = $1) AND (associated_object_id((a0."data")) = $2) AND ((a0."data")->>'type' = $3) LIMIT 1That’s the point I’m at.
-
Embed this notice
sjw (sjw@poa.st)'s status on Wednesday, 07-Sep-2022 11:33:10 JST sjw @alex cc @graf @lanodan -
Embed this notice
sjw (sjw@poa.st)'s status on Wednesday, 07-Sep-2022 11:33:11 JST sjw @alex Well fuck....
files.shittyurl.org/raw/baest-error.log -
Embed this notice
sjw (sjw@poa.st)'s status on Wednesday, 07-Sep-2022 11:33:11 JST sjw @alex Any ideas? -
Embed this notice
ew (e@masochi.st)'s status on Wednesday, 07-Sep-2022 11:41:13 JST ew @alex @graf @sjw @lanodan lmao -
Embed this notice
Alex Gleason (alex@gleasonator.com)'s status on Wednesday, 07-Sep-2022 11:41:13 JST Alex Gleason Please help me -
Embed this notice
Alex Gleason (alex@gleasonator.com)'s status on Wednesday, 07-Sep-2022 11:44:01 JST Alex Gleason Oh yeah I have so many ideas. Let me get back to my keyboard, one min -
Embed this notice
Alex Gleason (alex@gleasonator.com)'s status on Wednesday, 07-Sep-2022 13:04:10 JST Alex Gleason Yeah I might just restore the old index. Something is clearly wrong with this. -
Embed this notice
ew (e@masochi.st)'s status on Wednesday, 07-Sep-2022 13:04:11 JST ew @alex @graf @lanodan @sjw lul. so what it's doing is generating an index based on a function that just returns the object.id? lmao. if bleroma weren't working with jsonb objects this would be a simple "create index on object (id)" -
Embed this notice
ew (e@masochi.st)'s status on Wednesday, 07-Sep-2022 13:04:12 JST ew @alex @graf @lanodan @sjw I mean, you can also just echo mix rollback the migration -
Embed this notice
ew (e@masochi.st)'s status on Wednesday, 07-Sep-2022 13:04:13 JST ew @alex @graf @sjw @lanodan why don't you just run the migration while the server is down -
Embed this notice
Alex Gleason (alex@gleasonator.com)'s status on Wednesday, 07-Sep-2022 13:23:48 JST Alex Gleason I tried your suggestion, but it's still spewing errors. I think the background migration is only clogging up the system, but the real coding error is somewhere in this: https://git.pleroma.social/pleroma/pleroma/-/merge_requests/3692
Look at my top offending queries (it's all about "associated_object_id"): -
Embed this notice
Your New Marijuana Injecting Waifu :weed: (sjw@bae.st)'s status on Wednesday, 07-Sep-2022 13:23:49 JST Your New Marijuana Injecting Waifu :weed: @alex @e @graf @sjw @lanodan This is actually unrelated AFAIK. This migration we're having trouble with is from this:
https://git.pleroma.social/pleroma/pleroma/-/merge_requests/3717In conversation permalink Attachments
-
Embed this notice
Alex Gleason (alex@gleasonator.com)'s status on Wednesday, 07-Sep-2022 13:24:41 JST Alex Gleason The problem is that reverting this is gonna be a bitch. In conversation permalink -
Embed this notice
Your New Marijuana Injecting Waifu :weed: (sjw@bae.st)'s status on Friday, 09-Sep-2022 02:36:11 JST Your New Marijuana Injecting Waifu :weed: @alex @e @graf @lanodan @mint @sjw
>"max_processed_id": 109470614
Only about 6 million left! :oyvey_right:
Screenshot_20220908-121434_Termux.jpgIn conversation permalink Attachments
Alex Gleason likes this. -
Embed this notice
Your New Marijuana Injecting Waifu :weed: (sjw@bae.st)'s status on Friday, 09-Sep-2022 02:36:12 JST Your New Marijuana Injecting Waifu :weed: @alex @e @graf @lanodan @sjw @mint
>"max_processed_id": 54000480
Whoo! Over 54 million objects processed so far! I'm now over halfway done!
>"affected_count": 16181100
Huh, yeah that's about 30%.In conversation permalink -
Embed this notice
Your New Marijuana Injecting Waifu :weed: (sjw@bae.st)'s status on Friday, 09-Sep-2022 02:36:13 JST Your New Marijuana Injecting Waifu :weed: @alex @e @graf @lanodan @sjw
>"max_processed_id": 21888612
I've almost hit 33M!
These settings seem to be working well for me. It might be the higher work_memIn conversation permalink -
Embed this notice
Your New Marijuana Injecting Waifu :weed: (sjw@bae.st)'s status on Friday, 09-Sep-2022 02:36:14 JST Your New Marijuana Injecting Waifu :weed: @alex @e @graf @lanodan @sjw About 10% done now! In conversation permalink -
Embed this notice
Your New Marijuana Injecting Waifu :weed: (sjw@bae.st)'s status on Friday, 09-Sep-2022 02:36:14 JST Your New Marijuana Injecting Waifu :weed: @alex @e @graf @lanodan @sjw 20% done In conversation permalink -
Embed this notice
Your New Marijuana Injecting Waifu :weed: (sjw@bae.st)'s status on Friday, 09-Sep-2022 02:36:15 JST Your New Marijuana Injecting Waifu :weed: @alex @e @graf @sjw @lanodan Fix didn't work for me either so I disabled it a while ago but mine just randomly stopped spewing errors about DB timeouts and it's still averaging around 200 records a second. In conversation permalink -
Embed this notice
Alex Gleason (alex@gleasonator.com)'s status on Friday, 09-Sep-2022 02:36:38 JST Alex Gleason Did your logs at least stop screaming about dropped connections? Upgrading to Postgres 14 fixed that for me. In conversation permalink -
Embed this notice
Your New Marijuana Injecting Waifu :weed: (sjw@bae.st)'s status on Friday, 09-Sep-2022 02:45:07 JST Your New Marijuana Injecting Waifu :weed: @alex @e @mint @graf @sjw @lanodan I've been on 14 for many months now.
It seems the beginning of the migration is hell until it just magically fixes itself.
I think both of our "fixes" were just coincidence.In conversation permalink Alex Gleason likes this. -
Embed this notice
Your New Marijuana Injecting Waifu :weed: (sjw@bae.st)'s status on Friday, 09-Sep-2022 02:45:43 JST Your New Marijuana Injecting Waifu :weed: @alex @e @mint @graf @sjw @lanodan They did and CPU usage went down a tonne. Only slight above normal.
See the settings I'm using. I think it had to do with the higher work_mem
100 connections was enough to keep plerona from crashing.
After about an hour or two it stopped bitching in the logs, CPU went away down, and it started chugging along.
I also ran a vacuum analyse on the database while it was running. Not sure if that played a part or not.In conversation permalink Alex Gleason likes this. -
Embed this notice
Your New Marijuana Injecting Waifu :weed: (sjw@bae.st)'s status on Friday, 09-Sep-2022 23:59:29 JST Your New Marijuana Injecting Waifu :weed: @alex @e @graf @lanodan @mint @sjw we're done!
Screenshot_20220909-090040_Termux.jpgIn conversation permalink Attachments
Alex Gleason likes this.
-
Embed this notice