Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unreferenced state group cleanup job in v1.126.0rc2 caused explosion in number of state group state rows #18217

Open
tulir opened this issue Mar 6, 2025 · 11 comments

Comments

@tulir
Copy link
Contributor

tulir commented Mar 6, 2025

Description

Not quite sure how/why, but something about the unreferenced state group cleanup (which was enabled in #18154) seems to have caused the number of state groups, especially in HQ, to explode and use the entire disk.

The _delete_state_groups_loop background job was using 40% cpu and 100% database according to prom metrics the entire time. The number of rows in state_groups_state was growing by tens or hundreds of thousands per minute. The logs for _delete_state_groups_loop-0- didn't seem to mention HQ state groups specifically, it was cleaning up other groups all the time.

I first downgraded to 1.125.0, but it didn't have any effect. Then I cleared state_groups_pending_deletion and upgraded back to 1.126.0rc2, which made the explosion stop.

Image
Image

Steps to reproduce

  • be in HQ (may require being there for a long time prior)
  • upgrade to 1.126.0rc2
  • wait a few hours
  • find out that HQ's state groups have doubled in size

Homeserver

maunium.net

Synapse Version

1.126.0rc2

Installation Method

Docker (maunium/synapse)

Database

Postgres

Workers

Multiple workers

Platform

Docker on Ubuntu 22.04

@reivilibre
Copy link
Contributor

reivilibre commented Mar 6, 2025

I guess not obvious from your graph: how did you get your disk space back? Did you roll back to an earlier database backup or did it actually self-recover afterwards?

@reivilibre
Copy link
Contributor

The number of rows in state_groups_state was growing by tens or hundreds of thousands per minute.

ooi, how are you measuring this?

As far as I understand it, this mechanism should not insert any rows into that table; sheerly the opposite (it deletes them).

Hearing that the table grows is, therefore, perplexing.

@tulir
Copy link
Contributor Author

tulir commented Mar 6, 2025

how did you get your disk space back? Did you roll back to an earlier database backup or did it actually self-recover afterwards?

It didn't recover, I deleted other stuff (and afterwards deleted HQ to try to free up more space). The part where it kept going down after recovering was synapse continuing to add rows to state_groups_state. It flattened out when I cleared state_groups_pending_deletion and upgraded back to 1.126

ooi, how are you measuring this?

Very manually by running SELECT COUNT(*) FROM state_groups_state WHERE room_id='!OGEhHVWSdvArJzumhm:matrix.org'; every couple of minutes

@reivilibre
Copy link
Contributor

As far as I understand it, this mechanism should not insert any rows into that table; sheerly the opposite (it deletes them).

I have to contradict myself:

self.db_pool.simple_insert_many_txn(
txn,
table="state_groups_state",
keys=("state_group", "room_id", "type", "state_key", "event_id"),
values=[
(sg, room_id, key[0], key[1], state_id)
for key, state_id in curr_state.items()
],
)

This isn't meant to get triggered because the state groups should be unreferenced. But do you see any [purge] de-delta-ing remaining state group in your logs?

@tulir
Copy link
Contributor Author

tulir commented Mar 6, 2025

But do you see any [purge] de-delta-ing remaining state group in your logs?

Yes, loads of those. I thought that log just means it's cleaning up state groups

$ sudo cat /var/log/synapse/homeserver.log.* | grep 'de-delta-ing remaining state group' | wc -l
61528

In case it's relevant, the state_groups_pending_deletion table had 47055 rows when I cleared it (I took a backup before clearing)

@reivilibre
Copy link
Contributor

reivilibre commented Mar 7, 2025

Thank you for that, I think this points to this behaviour as the probable cause.

Based on what I can see:

  • when a state group is 'de-delta-ed', it is still needed in the database but we are deleting one of its previous state groups
  • when we de-delta a state group, we convert it to storing a full copy(!) of the state at that group — i.e. 1 state_groups_state row for every piece of active state at that point in time.
    • if you have 61k state groups being de-delta-ed, then you are getting 61k × S rows inserted, where S = the average number of state events active at the state groups. For Matrix HQ, this is probably some tens of thousands (I'm not sure on the real number, but it's a sizeable room I know that for sure). Not all of these de-delta-ings will necessarily be Matrix HQ but still, arguing for the sake of ease that they are, 61k × 10k works out to some 600 million rows, we might have a problem with that. :p
  • instead, we should probably store it as a delta with a different base, essentially just merging the deleted state group into its child/successor state groups, instead of making the successors full snapshots.

#18219 is some code that crafts a situation that illustrates this. This is probably going to wait until Monday for someone who knows the intent behind this mechanism to weigh in / confirm I'm not making stuff up.

@reivilibre
Copy link
Contributor

out of interest: Have you previously purged history or had retention policies that would purge history?

reivilibre added a commit that referenced this issue Mar 7, 2025

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
This mechanism is suspected of inserting large numbers of rows into `state_groups_state`,
thus unreasonably increasing disk usage.

See: #18217

This reverts commit 5121f92.
@tulir
Copy link
Contributor Author

tulir commented Mar 7, 2025

I don't think so, I've only deleted entire rooms and used the state compressor for disk space management

@reivilibre reivilibre changed the title v1.126.0rc2 caused explosion in number of state groups v1.126.0rc2 caused explosion in number of state group state rows Mar 7, 2025
@reivilibre reivilibre changed the title v1.126.0rc2 caused explosion in number of state group state rows Unreferenced state group cleanup job in v1.126.0rc2 caused explosion in number of state group state rows Mar 7, 2025
reivilibre added a commit that referenced this issue Mar 7, 2025
…s introduced in v1.126.0rc1), due to a suspected issue that causes increased disk usage. (#18222)

Revert "Add background job to clear unreferenced state groups (#18154)"

This mechanism is suspected of inserting large numbers of rows into
`state_groups_state`,
thus unreasonably increasing disk usage.

See: #18217

This reverts commit 5121f92 (#18154).

---------

Signed-off-by: Olivier 'reivilibre <[email protected]>
@reivilibre
Copy link
Contributor

No longer a release blocker as this was rolled back in rc3.

@Maescool
Copy link

I believe the retention policies also has a similar issue to this, that at random times I get runaway rooms (been like this for a while) so I have to hard delete rooms and rejoin them

@devonh
Copy link
Member

devonh commented Mar 24, 2025

Doing a VACUUM FULL should get rid of all the extra DB size, and result in it being smaller overall. This was an issue with the previous approach leading to de-deltaed room state which temporarily increased DB size until a VACUUM was done.

This problem should not happen again with the new approach to deleting unreferenced state groups merged in #18254. It avoids deleting anything which would lead to de-deltaing.
When that makes it's way into a RC, I would appreciate if you could monitor your setup to confirm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants