@olivia how do you have so many stuck jobs lol, normal oban should hover at a couple of k jobs, if you don't mind losing the unfederated backlog, you can just delete from the table freely
mind showing what pleroma=> select state, queue, attempt, count(*) from oban_jobs group by state, queue, worker, attempt order by count(*) desc; prints?
@i@olivia Here's another report of the same shit happening: https://git.pleroma.social/pleroma/pleroma/-/issues/3335 20 retries might indeed be too much (I have around 1500 jobs right now, most of which are completed, a good half of those that aren't are retries for DRC since verita's :cloudflare: rules keep blocking me or something), but I'm more interested in how the fuck do they manage to pile up so hard. Forcefetched the mentioned post without any issues.
@olivia forgot feld made most things a background when removing worker from select but not group, probably as mint says, remote fetch workers stuck with 20! retries, working through spam
run watch 'sudo -Hu postgres psql pleroma -c "delete from oban_jobs where attempt > 3;"' for some hours and it should clear up
@mint@olivia i wonder if @feld would ever consider a dedup MRF into pleroma via simhash_ex or ex_lsh, since it would also require a cachex table, and those have to be defined ahead of time, unlike whenever we eventually switch to nebulex
@feld@olivia@mint the PASTA WITH EXTRA SPAM, like almost all the previous nuisances would have been discarded if 99% matches of the exact same text posts were ignored
@feld@i@olivia There was, I wasn't affected, some used AntiMentionSpam, keyword or reject policies. That said, it isn't related to current issue with RemoteFetcherWorkers piling up into millions (which I believe are only spawned by pinned post fetching pipeline in vanilla pleromer).
@feld@i@olivia Indeed, the three posts mentioned in the issue are the same three posts that's pinned on affected actor's profile. Don't notice anything out of ordinary in his collection aside from said posts having shitton of emojis.
@mint@i@olivia weird, why would it keep fetching them? can you confirm for me the profile so I can take a closer look?
also the dupes shouldn't happen with latest develop branch, at least if it tried it would cancel inserting the job every time because a duplicate one existed (until pruner kicks in and clears up old Oban jobs)
@feld@i@olivia The exact error might be irrelevant since they might also have some geoblocks or other :cloudflare: shenanigans going. What's more concerning is pileup happening in a first place; now that I'm thinking about it might be recursion. 1. pleromer receives an activity referencing that guy's profile/post 2. it fetches them 3. fetch pipeline kicks in 4. pinned posts fetching happens as a part of pipeline 5. pleromer inserts RemoteFetcherWorker jobs for those posts 6. said jobs try to fetch pinned posts again If that's the case (too lazy to confirm, sorry) and fetcher jobs start erroring out, the queue raises exponentially. Hopefully not?
@feld@i@olivia Indeed, but that's more of last frontier measure. There would still be some friction left pertaining to checking whether such job exists and raising an exception if it does.
@mint@i@olivia you don't want to raise an exception on a duplicate job in Oban; that would break a lot of stuff needlessly. It just drops the job silently. It's not an error scenario that needs to raise / cause the process to abort.