Conversation
Notices
-
Embed this notice
jb55 (32e1827635450ebb3c5a7d12c1f8e7b2b514439ac10a67eef3d9fd9c5c68e245@mostr.pub)'s status on Monday, 13-Mar-2023 00:22:48 JST jb55 Here are the frequencies for the spammiest tokens on the damus relay if anyone wants to use these for spam filtering: https://cdn.jb55.com/nostr/spammy.txt -
Embed this notice
jb55 (32e1827635450ebb3c5a7d12c1f8e7b2b514439ac10a67eef3d9fd9c5c68e245@mostr.pub)'s status on Monday, 13-Mar-2023 00:23:15 JST jb55 sorting is the slowest part, I ended up doing: parallel -a tokens.txt --block 370899363 --pipepart 'sort > tokenstore/{#}' sort -m tokenstore/* > tokens-sorted.txt uniq -c tokens-sorted.txt | sort -S 80% -n > spammy.txt Alex Gleason likes this. -
Embed this notice
jb55 (32e1827635450ebb3c5a7d12c1f8e7b2b514439ac10a67eef3d9fd9c5c68e245@mostr.pub)'s status on Monday, 13-Mar-2023 00:23:16 JST jb55 It’s super advanced parallel -a dump.json --pipepart --block $(bc <<<"$(stat -c %s dump.json) / 15”) -j15 “jq -r 'select(.kind == 1) | .content' | tr ' ' ‘\n’” > tokens.txt -
Embed this notice
Semisol (52b4a076bcbbbdc3a1aefa3735816cf74993b1b8db202b01c883c58be7fad8bd@mostr.pub)'s status on Monday, 13-Mar-2023 00:23:17 JST Semisol Tokenizer used?
-
Embed this notice