right now it's built as a normalizer — taking API responses, scraped page contents, and Service Migration Downloads from all those sources and normalizing them into huge piles of JSON files that can be loaded/parsed/wrangled consistently.
Next step is moving it into an arangodb instance, where crosslinks and connections (like ‘post X is a repost of post Y, from service Z”) into a knowledge graph with embeddings because ~ wheeee ~