Add indexes for user_management_user #3054

Red-M · 2023-09-22T06:33:50Z

as larger instances struggle with slow queries

What this PR does

While running Grafana Oncall, I found that the select statements behind showing the current schedule of all users would generate sequential queries using regex to parse a JSON column and also using lower() on the email column, causing a high CPU load on the DB.

Which issue(s) this PR fixes

This doesn't have a github issue.
We found that a larger instance of Oncall (207 rows in user_management_user) could have this table take up to 30 seconds to complete the table scan and that these indexes reduced that down to 600ms-900ms (averaging at about 700ms)

Checklist

[-] Unit, integration, and e2e (if applicable) tests updated
[-] Documentation added (or pr:no public docs PR label added if not required)
CHANGELOG.md updated (or pr:no changelog PR label added if not required)

CLAassistant · 2023-09-22T06:33:57Z

All committers have signed the CLA.

…h slow queries

matiasb

Thanks for the contribution! A few minor comments added.

matiasb · 2023-09-25T13:35:14Z

engine/apps/user_management/models/user.py

+            models.Index(fields=["organization", "is_active", "username", "role"]),
+            models.Index(fields=["organization", "is_active", "email", "role"]),
+            models.Index(fields=["username", "organization", "is_active", "permissions"]),
+            models.Index(fields=["email", "organization", "is_active", "permissions"]),


Any reason for ordering fields differently? (ie. "organization", "is_active", "username" vs "username", "organization", "is_active")

+1 to add some indexing to the User model, wondering if keeping "role" and "permissions" out wouldn't be enough (thinking on space)? so we only add:
"organization", "is_active", "username"
"organization", "is_active", "email"

The order of the indexing matters for performance reasons depending on the SQL engine.
Cardinality matters as it reduces the search time for rows, which this is attempting to eliminate as many rows as possible to remove the very costly table scan.

All 4 indexes are required as they're effectively used as a set of joins to reduce the number of rows down to (preferably) 1 row (in the best case).

CHANGELOG.md

matiasb · 2023-10-02T20:53:40Z

By any chance did you check if the recent updates around schedules improved your loading times (before applying these changes)? (e.g. besides some indexing added, there was some refactoring on how users details are fetched when RBAC is enabled)

OTOH, creating an index including a JSON column (permissions) is not possible in MySQL, migration fails with:

django.db.utils.OperationalError: (3152, "JSON column 'permissions' supports indexing only via generated columns on a specified JSON path.")

So it will require some extra work to enable it (eventually we may need to re-think how permissions are stored/checked).

Red-M · 2023-10-03T00:16:30Z

By any chance did you check if the recent updates around schedules improved your loading times (before applying these changes)? (e.g. besides some indexing added, there was some refactoring on how users details are fetched when RBAC is enabled)

No, technically speaking these changes aren't exactly what I had to apply since django won't accept:

CREATE INDEX user_management_user_urn_org_id_is_active_role USING BTREE
    ON `user_management_user` (`organization_id`, `is_active`, `username`, `role`);
    
CREATE INDEX user_management_user_lower_email_org_id_is_active_role USING BTREE
    ON `user_management_user` (`organization_id`, `is_active`, (lower(`email`)), `role`);

CREATE INDEX user_management_user_lower_urn_org_id_is_active_perms USING BTREE
    ON `user_management_user` (`username`, `organization_id`, `is_active`, (cast(json_unquote(permissions) as CHAR(500))) );

CREATE INDEX user_management_user_lower_email_org_id_is_active_perms USING BTREE
    ON `user_management_user` ((lower(`email`)), `organization_id`, `is_active`, (cast(json_unquote(permissions) as CHAR(500))));

OTOH, creating an index including a JSON column (permissions) is not possible in MySQL, migration fails with:

django.db.utils.OperationalError: (3152, "JSON column 'permissions' supports indexing only via generated columns on a specified JSON path.")

So it will require some extra work to enable it (eventually we may need to re-think how permissions are stored/checked).

Yeah I can see that in the tests, I'm not sure Django's ORM is giving a good experience here.

Red-M · 2023-10-03T00:38:35Z

Also #3067 is pretty poor form to pull the rug out from this PR.

Red-M requested a review from a team September 22, 2023 06:33

Add indexes for user_management_user as larger instances struggle wit…

419619a

…h slow queries

Red-M force-pushed the dev branch from 2834756 to 419619a Compare September 22, 2023 06:38

matiasb reviewed Sep 25, 2023

View reviewed changes

Red-M added 2 commits September 29, 2023 18:48

Github actions wanted this below 120 characters

cab7fe7

Merge remote-tracking branch 'gh-upstream/dev' into dev

2844926

Red-M closed this Oct 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add indexes for user_management_user #3054

Add indexes for user_management_user #3054

Red-M commented Sep 22, 2023

CLAassistant commented Sep 22, 2023 •

edited

Loading

matiasb left a comment

matiasb Sep 25, 2023

Red-M Sep 25, 2023

matiasb commented Oct 2, 2023

Red-M commented Oct 3, 2023

Red-M commented Oct 3, 2023

Add indexes for user_management_user #3054

Add indexes for user_management_user #3054

Conversation

Red-M commented Sep 22, 2023

What this PR does

Which issue(s) this PR fixes

Checklist

CLAassistant commented Sep 22, 2023 • edited Loading

matiasb left a comment

Choose a reason for hiding this comment

matiasb Sep 25, 2023

Choose a reason for hiding this comment

Red-M Sep 25, 2023

Choose a reason for hiding this comment

matiasb commented Oct 2, 2023

Red-M commented Oct 3, 2023

Red-M commented Oct 3, 2023

CLAassistant commented Sep 22, 2023 •

edited

Loading