Embed Notice

HTML Code

<blockquote style="position: relative; padding-left: 55px;"><section><a href="https://bae.st/objects/f06e720f-91c8-4563-b369-b7e24e899da0">:p: (p@bae.st)'s status on Wednesday, 03-Apr-2024 14:01:59 JST</a><a href="https://bae.st/users/p" title="p@bae.st"><img src="https://gnusocial.jp/avatar/231276-48-20240216070623.webp" width="48" height="48" alt=":p:" style="position: absolute; left: 0; top: 0;">:p:</a><div><ul><li><li><a href="https://gnusocial.jp/user/253121" title="p@shitposter.world">p</a></li></ul></div></section><article>:denton::hacker_f::hacker_s::hacker_e:::captcrunch:<br><br>SPC is down!  I don't know for how long, and accounts apparently moved to <a href="http://shitposter.world">shitposter.world</a>, so I am also at <a href="https://shitposter.world/users/p">@p</a>, but because I don't know if people know about that server (I didn't know about it until yesterday, and I just stumbled across it on SPC), Imma be on <a href="https://bae.st/users/p">@p</a> for a minute until FSE's bugout zone comes online (the machine is assembled and booting things and prerequisites installed and whatnot; it is a lot of fun to type `make -j64` and really *mean* it, but most of the packages I built before the hardware arrived).  I hope to finish the configuration in time to drive the box down there tonight or tomorrow night, although it may take a minute or two for some things.<br><br>Once the machine is back online, some pretty easy stuff will be back in a hurry:  things like git.fse and blog.fse and bloat and whatnot.  As I said a while ago, I will probably set up a bugout zone instance, just keep the users table so FSE refugees can log in with their regular username/password, same version of Balormo and everything, basically minimal-effort bugout zone.  I'll be on the bugout zone server, FSE refugees are welcome (registrations will be closed, so you'll only be able to get in if you already had an FSE account when the machine exploded), but it will be unstable because its primary purpose is to help with Revolver development (there are things that would be much easier to do with current, live data coming in) and it will probably usually be slow or weird and I will not be doing normal server things.  Some of the slowness will be a result of a bunch of Revolver stuff getting shoved around through the system, so it will be busy, some of the slowness will be because the machine is spec'd to do Revolver-style things rather than Pleroma-style things like the previous machine (e.g., the dead box had a really massive amount of RAM for Postgres, the way resources are partitioned in this machine, there's less RAM available in one place).  But it'll be there if you want it and you don't mind being subject to technical whimsy.<br><br>The bugout zone will be on a subdomain or maybe I grab a different domain for a minute, I don't know.  I'll announce it.  There will be something that is up at <a href="http://freespeechextremist.com">freespeechextremist.com</a> but will not be quite usable for a while, as it's software under active development, a lot of parts are missing, some things are stubbed, etc.  The media needs to move and that'll be trickier than it would have been if the box hadn't died prematurely, but I expect the old stuff will be able to come back; the newer uploads (and whatever stuff was fetched on demand after the media server moved to Revolver in December) should be exactly where it was, so it'll be visible again when the media.fse redirect is back in place.<br><br>The Postgres/Pleroma to Revolver ingestor's slowness is because I've been using a dumb on-disk storage scheme (and partly because while I'm debugging it, it crashes and dumps the failed data instead of continuing and then storing the rejects somewhere, and everything stopping and waiting for me to fix a bug hampers throughput); I was wondering how long I could get away with that storage mechanism, and the answer is "until about now".  So far, all of the rejected data comes from in-development software with bugs that were fixed years ago, so data coming out of the big three servers (Pleroma, Mastodon, Misskey) should all be handled correctly.  Once an entity is ingested, it's got a normalized representation, but for the sake of the /objects/ proxy feature, currently what we keep around is the original representation, so that what gets returned is exactly what the server reported.  This is less than ideal.  The current indexing method is really simple and really fast (under 20ms for timelines:  TWKN, tags, profiles, etc.) but there's a lot of disk overhead so it needs to be reworked, and part of that might be done by reworking the storage scheme and part of it might be done by just persisting the normalized objects, which are much more compact (there's a lot of redundant information in most activities like "content" and "contentMap", some fields have a fixed value and we don't need to store that, some are repetitive and they can be broken up for storage, things like that), and although the indexing method is very fast, it does consume a lot of blocks (which is a kind of advantage because it makes hunting through the slush harder by introducing noise, though if it bloats storage overhead too much, there's not much point).<br><br>I'm also doing it with the IPFS support disabled, as IPFS makes ingestion take about a hundred (SERIOUSLY) times as long.  IPFS has been nothing but pain so far; I don't know what they are doing that makes their code behave this way.  Just using flat files with split directories would be faster.  The idea was initially that we could use IPFS as an extra channel:  things could move through Revolver's protocol, or they could use IPFS, or ActivityPub, or stack those on top of Tor, possibly other means.<br><br>Also, I would like to mention that I am continuing my trend of using technology that makes people frustrated with me:  the ingestor, being glue code between Postgres and Revolver and having to parse JSON, is written in Ruby.  (The reason it has to parse JSON is so that it can scan for the string "https://www.w3.org/ns/activitystreams#Public" in the to/cc fields and avoid dumping people's DMs to a public network; it also doesn't ingest followers-only posts, but that is just the ingestor.)<br><br>At this point in the update, more things are popping into my head, "Oh, I should include that", and then "Oh, wait, did I account for $x?" and I keep falling down a rabbit hole, at the end of which I produce another paragraph, and it's either more or less information than anyone wants.</article><footer><a rel="bookmark" href="https://gnusocial.jp/conversation/2894280#notice-5752646">In conversation</a><time datetime="2024-04-03T14:01:59+09:00" title="Wednesday, 03-Apr-2024 14:01:59 JST">about a year ago</time> <span>from <span><a href="https://bae.st/objects/f06e720f-91c8-4563-b369-b7e24e899da0" rel="external" title="Sent from bae.st via ActivityPub">bae.st</a></span></span><a href="https://bae.st/objects/f06e720f-91c8-4563-b369-b7e24e899da0">permalink</a><h4>Attachments</h4><ol><li><label><a rel="external" href="https://gnusocial.jp/attachment/51294">Untitled attachment</a></label><br></li><li><article><header><div>No result found on File_thumbnail lookup.</div><h5><a href="https://freespeechextremist.com/">https://freespeechextremist.com/</a></h5><div></div></header><div></div><footer></footer></article></li></ol></footer></blockquote>

Corresponding Notice

Embed this notice
:p: (p@bae.st)'s status on Wednesday, 03-Apr-2024 14:01:59 JST:p:
- :p:
- p
:denton::hacker_f::hacker_s::hacker_e:::captcrunch:

SPC is down! I don't know for how long, and accounts apparently moved to shitposter.world, so I am also at @p, but because I don't know if people know about that server (I didn't know about it until yesterday, and I just stumbled across it on SPC), Imma be on @p for a minute until FSE's bugout zone comes online (the machine is assembled and booting things and prerequisites installed and whatnot; it is a lot of fun to type `make -j64` and really *mean* it, but most of the packages I built before the hardware arrived). I hope to finish the configuration in time to drive the box down there tonight or tomorrow night, although it may take a minute or two for some things.

Once the machine is back online, some pretty easy stuff will be back in a hurry: things like git.fse and blog.fse and bloat and whatnot. As I said a while ago, I will probably set up a bugout zone instance, just keep the users table so FSE refugees can log in with their regular username/password, same version of Balormo and everything, basically minimal-effort bugout zone. I'll be on the bugout zone server, FSE refugees are welcome (registrations will be closed, so you'll only be able to get in if you already had an FSE account when the machine exploded), but it will be unstable because its primary purpose is to help with Revolver development (there are things that would be much easier to do with current, live data coming in) and it will probably usually be slow or weird and I will not be doing normal server things. Some of the slowness will be a result of a bunch of Revolver stuff getting shoved around through the system, so it will be busy, some of the slowness will be because the machine is spec'd to do Revolver-style things rather than Pleroma-style things like the previous machine (e.g., the dead box had a really massive amount of RAM for Postgres, the way resources are partitioned in this machine, there's less RAM available in one place). But it'll be there if you want it and you don't mind being subject to technical whimsy.

The bugout zone will be on a subdomain or maybe I grab a different domain for a minute, I don't know. I'll announce it. There will be something that is up at freespeechextremist.com but will not be quite usable for a while, as it's software under active development, a lot of parts are missing, some things are stubbed, etc. The media needs to move and that'll be trickier than it would have been if the box hadn't died prematurely, but I expect the old stuff will be able to come back; the newer uploads (and whatever stuff was fetched on demand after the media server moved to Revolver in December) should be exactly where it was, so it'll be visible again when the media.fse redirect is back in place.

The Postgres/Pleroma to Revolver ingestor's slowness is because I've been using a dumb on-disk storage scheme (and partly because while I'm debugging it, it crashes and dumps the failed data instead of continuing and then storing the rejects somewhere, and everything stopping and waiting for me to fix a bug hampers throughput); I was wondering how long I could get away with that storage mechanism, and the answer is "until about now". So far, all of the rejected data comes from in-development software with bugs that were fixed years ago, so data coming out of the big three servers (Pleroma, Mastodon, Misskey) should all be handled correctly. Once an entity is ingested, it's got a normalized representation, but for the sake of the /objects/ proxy feature, currently what we keep around is the original representation, so that what gets returned is exactly what the server reported. This is less than ideal. The current indexing method is really simple and really fast (under 20ms for timelines: TWKN, tags, profiles, etc.) but there's a lot of disk overhead so it needs to be reworked, and part of that might be done by reworking the storage scheme and part of it might be done by just persisting the normalized objects, which are much more compact (there's a lot of redundant information in most activities like "content" and "contentMap", some fields have a fixed value and we don't need to store that, some are repetitive and they can be broken up for storage, things like that), and although the indexing method is very fast, it does consume a lot of blocks (which is a kind of advantage because it makes hunting through the slush harder by introducing noise, though if it bloats storage overhead too much, there's not much point).

I'm also doing it with the IPFS support disabled, as IPFS makes ingestion take about a hundred (SERIOUSLY) times as long. IPFS has been nothing but pain so far; I don't know what they are doing that makes their code behave this way. Just using flat files with split directories would be faster. The idea was initially that we could use IPFS as an extra channel: things could move through Revolver's protocol, or they could use IPFS, or ActivityPub, or stack those on top of Tor, possibly other means.

Also, I would like to mention that I am continuing my trend of using technology that makes people frustrated with me: the ingestor, being glue code between Postgres and Revolver and having to parse JSON, is written in Ruby. (The reason it has to parse JSON is so that it can scan for the string "https://www.w3.org/ns/activitystreams#Public" in the to/cc fields and avoid dumping people's DMs to a public network; it also doesn't ingest followers-only posts, but that is just the ingestor.)

At this point in the update, more things are popping into my head, "Oh, I should include that", and then "Oh, wait, did I account for $x?" and I keep falling down a rabbit hole, at the end of which I produce another paragraph, and it's either more or less information than anyone wants.
In conversationabout a year ago from bae.stpermalink
Attachments
1. Untitled attachment
2. No result found on File_thumbnail lookup.
  https://freespeechextremist.com/