@feld Makes me wonder how it works for normal ones lol
Like scrolling up I got [error] Object rejected while fetching https://poa.st/objects/[redacted-uuid] {:fetch, {:error, {:reject, "Filtered by id"}}} but filter-by-id isn't merged yet :D
@phnt AIUI The logging is completely unified, so I don't think we can have some subset of logs go one place and another elsewhere, only use the levels. Unless we intercept certain events and do it ourselves, which would be possible by ferrying the log events through the telemetry. I've been thinking about this too, but don't have a certain path forward yet
Leveraging the logger metadata to hold the massive debug error dumps could be a good solution. And if you enable structured logging you can make it spit out JSON and it will then be easily viewable.
We'll see where this goes, but getting better logs and metrics will really help mature the administration story
@feld If possible and easy to do, I would like split logs. Where syslog gets simplified messages that don't dump the whole error and object for federation issues (like pictured in your post) and only logs internal errors like process termination, timeouts etc. with the full error, to make the difference between files and syslog less confusing. And a log file that gets the full messages for debugging and more thorough monitoring.
It is sort of possible now with different log levels for syslog and console, but I don't know how hard it would be effectively sending two different messages through Elixir's logger.
Also it would be nice to put a status to success messages like {:fetch, :ok} instead of showing them only on errors (as pictured above).
EDIT: Looks like the error could be passed as metadata to the logging system instead of the current way of dumping it to the message. That way it can be easily formatted in the config (omitting the metadata in the syslog format and including it in the file log.)
As for logging to file, I don't know how hard it is in Elixir. There's a module for it in Hex, but I don't know what stance Pleroma has regarding 3rd party modules for stuff like this. Or completely sidestep the issue by logging to console and redirecting stdout to a file in a start script/service.
IMO the Phoenix web logs are 99% useless and should just be disabled by default unless you need to debug something. If you want web logs it's easier to just access the logs of your reverse proxy anyway (nginx, whatever)
as part of this work to start jamming important logs through the Telemetry framework I will be able to build prom_ex modules so we can export more statistics in prometheus format to build Grafana dashboards with
@phnt@feld Which is just ugly as fuck, like you'd end up with two log files to read for no reason and console should be for purely runtime like launching it manually.
$mix run
Hello world from Logger asdf=with metadata
$tail -n 1 /var/log/everything.log
Elixir[31282]: Hello world from Logger
config :logger, :console,
format: "\n$message\n"
config :logger, :ex_syslogger,
format: "\n$message $metadata\n"
metadata: [:asdf]
This way the syslog is less noisy without the error dumps and the console log is sent to stdout that can be then redirected to a file in the service declaration. systemd already has a config for that and OpenRC and BSD inits can probably also redirect output with >some_log.txt
The default Pleroma config.exs already has some metadata configured like that for actors, paths and users.
@phnt@lanodan this is essentially what I was thinking about too. You can do it two ways:
Logger.error("message here", reason: error)
or simply
# attach this error metadata to ALL logs emitted by this process Logger.metadata([reason: error]) Logger.error("message here")
the nice part about using Logger.metadata is that we could probably write a helper function that will just keep appending errors to a list so we can accumulate them
@lanodan@feld I've looked through the telemetry branch and implementing this would require rewrites, so it's not worth doing it this way anyway.
On develop it would only require rewriting the calls that dump the full error into something like this: Logger.error("Objected rejected while fetching #{id}", reason: error) and and editing the config.
The structured data in syslog looks nice, if it is actually supported by Elixir and daemons.
@feld Trying to tackle that, first (actually second, the first one was a brainfart when I tried to handle it in transmogrifier) instinct was to handle it in Federator but that would only work for users that already exist in local DB and thus will do nothing. The problem is the very first thing Pleroma does upon receiving any activity from unknown actor is fetching and saving the actor in ReceiverWorker in order to verify its signature, thus I don't see a way to preemtively filter them without tackling shit onto it.
@feld Something like that. Wonder if there's any further demand for preemptive activity filtering before ever resolving the pubkeys (resulting in actor getting fetched), if so it should be expanded into proper facility. https://git.pleroma.social/mint/pleroma/-/commit/1fbe696c011fea497828fb16f0fd73bc1cfeff6e Haven't made any tests yet, partially because I'm not sure how to handle them, partially because rebar throws some cryptic error on my workstation when compiling syslogger.
@feld I'd prefer not to touch Repo. With this it should just behave like any other canceled job in the queue, except getting canceled much earlier than others. Anyway, analysis tests are also failing for whatever reason.
@mint I don't know how much work it would be to refactor things but you could wrap a stage in a Repo.Transaction and then you could error and rollback would happen automatically
@feld In case of receiving an activity from unknown user, Federator is called after ReceiverWorker which first has to fetch the actor and its signature. If you put it there, Pleroma would still create the entry for freshly deleted remote user.
@mint However, this means that if there are any side effects of the failure we need to keep in the database like tracking reachability of the instance we would need to change that side effect to a process we can message so that database work doesn't get rolled back. This is super easy to do and I'll help with it.
@mint Ahh that sounds correct, dang. I'll keep thinking about this. There has to be a way we can stop people from being able to fill the database with entries. This is a bug and will be treated as such
@feld I'm trying to handle #3311 which bloats the users table thanks to mastodon's Delete behavior. Filling the Oban queue is a different issue as it can be done with any other activity type as long as you send a request with a valid structure of the signature header; that has to be solved with some more advanced ratelimiting than we already have.
@mint We should be able to capture the domain in the json and ip it's being sent from and have two Hammer rate limit queues or something similar in a Plug filter on the inbox, right? We don't have to do any validation or anything at this step. If either hits a threshold we can start erroring there and come up with a way to exfiltrate useful metadata so people can automate firewalls or something too. Lots of options here
@feld Returning a boolean makes more sense, yeah. Not sure why you deleted the {:ok, %User{}} assertion from get_or_fetch_by_ap_id that's called after that.
@mint I'm gonna move this into the controller and check it at the inbox endpoint, actually. I don't want them to be able to write to the database at all
@mint I just cleaned up all my dead accounts with this
DELETE FROM users WHERE note_count = '0' AND actor_type = 'Person' AND NOT nickname = 'internal.fetch';
I'll just be patient now.
edit: that's potentially dangerous as e.g., users who only follow or have only done boosts will get caught up in that query, so don't run it on a server you actually care about
edit: no that's a bad example, you can't fetch their profile if their account is deleted. So I need to watch closer for deletes of activities that cause a user to be created
@feld In case of follows one can just check the follow relationships table. With reports having a bunch of orphaned ones in DB likely won't hurt anything, but you should also skip relays and other service actors.