Unreferenced state group cleanup job in v1.126.0rc2 caused explosion in number of state group state rows #18217

tulir · 2025-03-06T10:22:08Z

Description

Not quite sure how/why, but something about the unreferenced state group cleanup (which was enabled in #18154) seems to have caused the number of state groups, especially in HQ, to explode and use the entire disk.

The _delete_state_groups_loop background job was using 40% cpu and 100% database according to prom metrics the entire time. The number of rows in state_groups_state was growing by tens or hundreds of thousands per minute. The logs for _delete_state_groups_loop-0- didn't seem to mention HQ state groups specifically, it was cleaning up other groups all the time.

I first downgraded to 1.125.0, but it didn't have any effect. Then I cleared state_groups_pending_deletion and upgraded back to 1.126.0rc2, which made the explosion stop.

Steps to reproduce

be in HQ (may require being there for a long time prior)
upgrade to 1.126.0rc2
wait a few hours
find out that HQ's state groups have doubled in size

Homeserver

maunium.net

Synapse Version

1.126.0rc2

Installation Method

Docker (maunium/synapse)

Database

Postgres

Workers

Multiple workers

Platform

Docker on Ubuntu 22.04

The text was updated successfully, but these errors were encountered:

reivilibre · 2025-03-06T15:00:50Z

I guess not obvious from your graph: how did you get your disk space back? Did you roll back to an earlier database backup or did it actually self-recover afterwards?

reivilibre · 2025-03-06T15:09:29Z

The number of rows in state_groups_state was growing by tens or hundreds of thousands per minute.

ooi, how are you measuring this?

As far as I understand it, this mechanism should not insert any rows into that table; sheerly the opposite (it deletes them).

Hearing that the table grows is, therefore, perplexing.

tulir · 2025-03-06T15:45:21Z

how did you get your disk space back? Did you roll back to an earlier database backup or did it actually self-recover afterwards?

It didn't recover, I deleted other stuff (and afterwards deleted HQ to try to free up more space). The part where it kept going down after recovering was synapse continuing to add rows to state_groups_state. It flattened out when I cleared state_groups_pending_deletion and upgraded back to 1.126

ooi, how are you measuring this?

Very manually by running SELECT COUNT(*) FROM state_groups_state WHERE room_id='!OGEhHVWSdvArJzumhm:matrix.org'; every couple of minutes

reivilibre · 2025-03-06T17:10:02Z

As far as I understand it, this mechanism should not insert any rows into that table; sheerly the opposite (it deletes them).

I have to contradict myself:

synapse/synapse/storage/databases/state/store.py

Lines 816 to 824 in 350e84a

    
           self.db_pool.simple_insert_many_txn( 
        
               txn, 
        
               table="state_groups_state", 
        
               keys=("state_group", "room_id", "type", "state_key", "event_id"), 
        
               values=[ 
        
                   (sg, room_id, key[0], key[1], state_id) 
        
                   for key, state_id in curr_state.items() 
        
               ], 
        
           )

This isn't meant to get triggered because the state groups should be unreferenced. But do you see any [purge] de-delta-ing remaining state group in your logs?

tulir · 2025-03-06T19:05:20Z

But do you see any [purge] de-delta-ing remaining state group in your logs?

Yes, loads of those. I thought that log just means it's cleaning up state groups

$ sudo cat /var/log/synapse/homeserver.log.* | grep 'de-delta-ing remaining state group' | wc -l
61528

In case it's relevant, the state_groups_pending_deletion table had 47055 rows when I cleared it (I took a backup before clearing)

reivilibre · 2025-03-07T13:51:36Z

Thank you for that, I think this points to this behaviour as the probable cause.

Based on what I can see:

when a state group is 'de-delta-ed', it is still needed in the database but we are deleting one of its previous state groups
when we de-delta a state group, we convert it to storing a full copy(!) of the state at that group — i.e. 1 state_groups_state row for every piece of active state at that point in time.
- if you have 61k state groups being de-delta-ed, then you are getting 61k × S rows inserted, where S = the average number of state events active at the state groups. For Matrix HQ, this is probably some tens of thousands (I'm not sure on the real number, but it's a sizeable room I know that for sure). Not all of these de-delta-ings will necessarily be Matrix HQ but still, arguing for the sake of ease that they are, 61k × 10k works out to some 600 million rows, we might have a problem with that. :p
instead, we should probably store it as a delta with a different base, essentially just merging the deleted state group into its child/successor state groups, instead of making the successors full snapshots.

#18219 is some code that crafts a situation that illustrates this. This is probably going to wait until Monday for someone who knows the intent behind this mechanism to weigh in / confirm I'm not making stuff up.

reivilibre · 2025-03-07T14:08:46Z

out of interest: Have you previously purged history or had retention policies that would purge history?

This mechanism is suspected of inserting large numbers of rows into `state_groups_state`, thus unreasonably increasing disk usage. See: #18217 This reverts commit 5121f92.

tulir · 2025-03-07T14:31:55Z

I don't think so, I've only deleted entire rooms and used the state compressor for disk space management

…s introduced in v1.126.0rc1), due to a suspected issue that causes increased disk usage. (#18222) Revert "Add background job to clear unreferenced state groups (#18154)" This mechanism is suspected of inserting large numbers of rows into `state_groups_state`, thus unreasonably increasing disk usage. See: #18217 This reverts commit 5121f92 (#18154). --------- Signed-off-by: Olivier 'reivilibre <[email protected]>

reivilibre · 2025-03-11T11:43:50Z

No longer a release blocker as this was rolled back in rc3.

Maescool · 2025-03-14T21:40:53Z

I believe the retention policies also has a similar issue to this, that at random times I get runaway rooms (been like this for a while) so I have to hard delete rooms and rejoin them

devonh · 2025-03-24T14:39:12Z

Doing a VACUUM FULL should get rid of all the extra DB size, and result in it being smaller overall. This was an issue with the previous approach leading to de-deltaed room state which temporarily increased DB size until a VACUUM was done.

This problem should not happen again with the new approach to deleting unreferenced state groups merged in #18254. It avoids deleting anything which would lead to de-deltaing.
When that makes it's way into a RC, I would appreciate if you could monitor your setup to confirm.

reivilibre added X-Release-Blocker X-Regression and removed X-Regression labels Mar 6, 2025

reivilibre mentioned this issue Mar 6, 2025

(encoding of confusion: unreferenced state group deletion either doesn't make sense, or can explode size of state_groups_state) #18219

Draft

reivilibre added A-Disk-Space S-Critical T-Defect labels Mar 6, 2025

reivilibre mentioned this issue Mar 7, 2025

Revert the background job to clear unreferenced state groups (that was introduced in v1.126.0rc1), due to a suspected issue that causes increased disk usage. #18222

Merged

reivilibre changed the title ~~v1.126.0rc2 caused explosion in number of state groups~~ v1.126.0rc2 caused explosion in number of state group state rows Mar 7, 2025

reivilibre changed the title ~~v1.126.0rc2 caused explosion in number of state group state rows~~ Unreferenced state group cleanup job in v1.126.0rc2 caused explosion in number of state group state rows Mar 7, 2025

reivilibre removed the X-Release-Blocker label Mar 11, 2025

n-peugnet mentioned this issue Mar 14, 2025

Add background job to clear unreferenced state groups #18154

Merged

3 tasks

reivilibre mentioned this issue Mar 17, 2025

Add background job to clear unreferenced state groups #18150

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unreferenced state group cleanup job in v1.126.0rc2 caused explosion in number of state group state rows #18217

Unreferenced state group cleanup job in v1.126.0rc2 caused explosion in number of state group state rows #18217

tulir commented Mar 6, 2025 •

edited

Loading

reivilibre commented Mar 6, 2025 •

edited

Loading

reivilibre commented Mar 6, 2025

tulir commented Mar 6, 2025

reivilibre commented Mar 6, 2025

tulir commented Mar 6, 2025 •

edited

Loading

reivilibre commented Mar 7, 2025 •

edited

Loading

reivilibre commented Mar 7, 2025

tulir commented Mar 7, 2025

reivilibre commented Mar 11, 2025

Maescool commented Mar 14, 2025

devonh commented Mar 24, 2025

Unreferenced state group cleanup job in v1.126.0rc2 caused explosion in number of state group state rows #18217

Unreferenced state group cleanup job in v1.126.0rc2 caused explosion in number of state group state rows #18217

Comments

tulir commented Mar 6, 2025 • edited Loading

Description

Steps to reproduce

Homeserver

Synapse Version

Installation Method

Database

Workers

Platform

reivilibre commented Mar 6, 2025 • edited Loading

reivilibre commented Mar 6, 2025

tulir commented Mar 6, 2025

reivilibre commented Mar 6, 2025

tulir commented Mar 6, 2025 • edited Loading

reivilibre commented Mar 7, 2025 • edited Loading

reivilibre commented Mar 7, 2025

tulir commented Mar 7, 2025

reivilibre commented Mar 11, 2025

Maescool commented Mar 14, 2025

devonh commented Mar 24, 2025

tulir commented Mar 6, 2025 •

edited

Loading

reivilibre commented Mar 6, 2025 •

edited

Loading

tulir commented Mar 6, 2025 •

edited

Loading

reivilibre commented Mar 7, 2025 •

edited

Loading