Conversation

Notices

Embed this notice
Yukkuri (iamtakingiteasy@eientei.org)'s status on Sunday, 02-Mar-2025 06:40:27 JST Yukkuri
(1/2)

Migrating pleroma database to partitioned objects/activities tables seems to be possible with some schema and query alterations.

There are two main problems that could prevent meaningful partitioning
- P1. object ids are not time-ordered
- P2. postgresql does not support global cross-partition indexes and scanning every index of every partition amplifies cross-lookups to a crawl on joins
Countering P1 is easy - by simply changing the type from bigint to uuid updating all relations:

Pleroma uses time-ordered 128-bit flake ids, which are by default structured as
- [64 bits of timestamp in millisecond resolution]
- [48 bits of device identifier]
- [16 bits of sequential counter within the same millisecond]
So inserted_at can be used as a timestamp and original bigint id can be smeared across remaining 64 bits, allowing backward migration.

After dropping all foreign keys on objects-related tables and default sequence generator it's just alter type using, like
alter table objects alter column id type uuid using (lpad(to_hex((extract(epoch from inserted_at)*1000)::bigint), 16, '0') || lpad(to_hex(id), 16, '0'))::uuid;
Then every objects-related table needs a temporary field to set new uuid alongside their old foreign key bigint and similarly migrated with alter type using that temporary field. Then temporary fields can be dropped and foreign key constraints re-added.
Countering P2 is more complicated, since existing activities/objects tables need to be split further in two: partitioned data table and unpartitioned denormalization/index table.

unpartitioned activities_attributes table must have:
- activity_id foreign key to partitioned activities_data
- object_id (uuid) and object_ap_id (text) in order to correlate with objects without jsonb data
- visibility with pre-computed activity_visibility()
- type with data->>'type'
- context with data->>'context'
- ap_id with data->>'id'
- actor, recipients and local -- currently denormalized fields
unpartitioned objects_attributes table must have:
- object_id foreign key to partitioned objects
- type with data->>'type'
- ap_id with data->>'id'
- actor with data->>'actor'
fts_content can actually be partitioned without much impact to search queries.

To match current interface, activities_data and activities_attributes can be joined with a table without much impact, since data from both is needed in virtually every query.

Objects can remain as plain table since a lot of times it's just lookup by object id without any need for objects_attributes fields, with manual joins in queries where needed.

Both activities view and objects table would need triggers after insert and before delete (or instead in case of a view) to populate corresponding attributes. With it, separate status_visibility_counter_cache_trigger is also no longer needed and can be incorporated into activity triggers, since both can benefit from shared logic.

A proof of concept based on upstream develop supporting schema above, including up/down migrations and necessary query/test changes:

https://gitlab.eientei.org/eientei/pleroma/-/tree/upstream-flake

migration: https://gitlab.eientei.org/eientei/pleroma/-/blob/upstream-flake/priv/repo/migrations/20250109104615_migrate_objects_to_flake_id.exs

On eientei DB snapshot (~230G combined activities and objects) it takes 3-6 hours.
In conversation about 4 months ago from eientei.org permalink
Attachments
1. Untitled attachment
  https://eientei.org/media/77/72/d3/7772d33a973b9da2b2d2a05b11eb6427eed626a806a7f5d6656242864f66c5ac.jpg?name=72ff2466cc3cdf7e7196e173f9d7c444.jpg
2. Domain not in remote thumbnail source whitelist: gitlab.eientei.org
  
  Files · upstream-flake · eientei / pleroma · GitLab
  
  GitLab Community Edition
3. Domain not in remote thumbnail source whitelist: gitlab.eientei.org
  
  priv/repo/migrations/20250109104615_migrate_objects_to_flake_id.exs · upstream-flake · eientei / pleroma · GitLab
  
  GitLab Community Edition
- and Phantasm like this.
- Embed this notice
  Yukkuri (iamtakingiteasy@eientei.org)'s status on Sunday, 02-Mar-2025 06:41:30 JST Yukkuri
  in reply to
  (2/2)
  
  With such schema/query changes it is possible now to partition objects_data/activities_data tables at any granularity.
  
  For example, for activites, on yearly basis:
  alter table activities_data rename to activities_data_old; -- drop indexes, foreign keys on activities_data_old create table activities_data ( id uuid not null primary key, data jsonb not null, inserted_at timestamp(0) without time zone not null, updated_at timestamp(0) without time zone not null ) partition by range (id); create table if not exists activities_data_2020 partition of activities_data for values from ('0000016f-5e66-e800-0000-000000000000') to ('00000176-bb3e-7000-0000-000000000000'); create table if not exists activities_data_2021 partition of activities_data for values from ('00000176-bb3e-7000-0000-000000000000') to ('0000017e-12ef-9c00-0000-000000000000'); create table if not exists activities_data_2022 partition of activities_data for values from ('0000017e-12ef-9c00-0000-000000000000') to ('00000185-6aa0-c800-0000-000000000000'); create table if not exists activities_data_2023 partition of activities_data for values from ('00000185-6aa0-c800-0000-000000000000') to ('0000018c-c251-f400-0000-000000000000'); create table if not exists activities_data_2024 partition of activities_data for values from ('0000018c-c251-f400-0000-000000000000') to ('00000194-1f29-7c00-0000-000000000000'); create table if not exists activities_data_2025 partition of activities_data for values from ('00000194-1f29-7c00-0000-000000000000') to ('0000019b-76da-a800-0000-000000000000'); create table if not exists activities_data_2026 partition of activities_data for values from ('0000019b-76da-a800-0000-000000000000') to ('000001a2-ce8b-d400-0000-000000000000'); insert into activities_data select * from activities_data_old; -- recreate indexes, recreate foreign keys on activities_data
  And it would work transparently. Partitions can be either added manually for upcoming years, or added on-demand in insert trigger.
  
  On eientei DB snapshot such partitioning takes 1-3 hours.
  
  Naturally, due to attributes denormalization there will be some overhead, but a majority of old partitions could now be extracted to separate tablespace (on another disk/raid entirely, e.g. to hdd freeing up ssd space).
  
  For example using eientei numbers, in (data|indexes)
  
  before any schema changes:
  - objects (65G|22G)
  - activities (79G|67G)
  after schema change and partitioning:
  - objects_attributes (7G|8G)
  - objects_2020 (1M|1M)
  - objects_2021 (8G|2G)
  - objects_2022 (15G|4G)
  - objects_2023 (15G|4G)
  - objects_2024 (21G|6G)
  - objects_2025 (3G|1G)
  - activities_attributes (47G|53G)
  - activities_data_2020 (1M|1M)
  - activities_data_2021 (6G|500M)
  - activities_data_2022 (12G|1200M)
  - activities_data_2023 (15G|1500M)
  - activities_data_2024 (21G|6G)
  - activities_data_2025 (4G|300M)
  which is a ~15% total overhead, but ~60% of total size can be moved to hdd for cold online storage by doing:
  create tablespace archive location '/mnt/coldstorage/postgres/archive'; alter table activities_data_2020 set tablespace archive; alter table activities_data_2021 set tablespace archive; alter table activities_data_2022 set tablespace archive; alter table activities_data_2023 set tablespace archive; alter table activities_data_2024 set tablespace archive; ... alter table objects_2020 set tablespace archive; alter table objects_2021 set tablespace archive; alter table objects_2022 set tablespace archive; alter table objects_2023 set tablespace archive; alter table objects_2024 set tablespace archive; ...
  while keeping only current year on fast ssd.
  In conversation about 4 months ago permalink
  
  and Phantasm like this.

Public

Conversation

Notices

Feeds