Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add indexes for user_management_user #3054

Closed
wants to merge 3 commits into from
Closed

Add indexes for user_management_user #3054

wants to merge 3 commits into from

Conversation

Red-M
Copy link

@Red-M Red-M commented Sep 22, 2023

as larger instances struggle with slow queries

What this PR does

While running Grafana Oncall, I found that the select statements behind showing the current schedule of all users would generate sequential queries using regex to parse a JSON column and also using lower() on the email column, causing a high CPU load on the DB.

Which issue(s) this PR fixes

This doesn't have a github issue.
We found that a larger instance of Oncall (207 rows in user_management_user) could have this table take up to 30 seconds to complete the table scan and that these indexes reduced that down to 600ms-900ms (averaging at about 700ms)

Checklist

  • [-] Unit, integration, and e2e (if applicable) tests updated
  • [-] Documentation added (or pr:no public docs PR label added if not required)
  • CHANGELOG.md updated (or pr:no changelog PR label added if not required)

@Red-M Red-M requested a review from a team September 22, 2023 06:33
@CLAassistant
Copy link

CLAassistant commented Sep 22, 2023

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@matiasb matiasb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! A few minor comments added.

models.Index(fields=["organization", "is_active", "username", "role"]),
models.Index(fields=["organization", "is_active", "email", "role"]),
models.Index(fields=["username", "organization", "is_active", "permissions"]),
models.Index(fields=["email", "organization", "is_active", "permissions"]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for ordering fields differently? (ie. "organization", "is_active", "username" vs "username", "organization", "is_active")

+1 to add some indexing to the User model, wondering if keeping "role" and "permissions" out wouldn't be enough (thinking on space)? so we only add:
"organization", "is_active", "username"
"organization", "is_active", "email"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order of the indexing matters for performance reasons depending on the SQL engine.
Cardinality matters as it reduces the search time for rows, which this is attempting to eliminate as many rows as possible to remove the very costly table scan.

All 4 indexes are required as they're effectively used as a set of joins to reduce the number of rows down to (preferably) 1 row (in the best case).

@matiasb
Copy link
Contributor

matiasb commented Oct 2, 2023

By any chance did you check if the recent updates around schedules improved your loading times (before applying these changes)? (e.g. besides some indexing added, there was some refactoring on how users details are fetched when RBAC is enabled)

OTOH, creating an index including a JSON column (permissions) is not possible in MySQL, migration fails with:

django.db.utils.OperationalError: (3152, "JSON column 'permissions' supports indexing only via generated columns on a specified JSON path.")

So it will require some extra work to enable it (eventually we may need to re-think how permissions are stored/checked).

@Red-M
Copy link
Author

Red-M commented Oct 3, 2023

By any chance did you check if the recent updates around schedules improved your loading times (before applying these changes)? (e.g. besides some indexing added, there was some refactoring on how users details are fetched when RBAC is enabled)

No, technically speaking these changes aren't exactly what I had to apply since django won't accept:

CREATE INDEX user_management_user_urn_org_id_is_active_role USING BTREE
    ON `user_management_user` (`organization_id`, `is_active`, `username`, `role`);
    
CREATE INDEX user_management_user_lower_email_org_id_is_active_role USING BTREE
    ON `user_management_user` (`organization_id`, `is_active`, (lower(`email`)), `role`);

CREATE INDEX user_management_user_lower_urn_org_id_is_active_perms USING BTREE
    ON `user_management_user` (`username`, `organization_id`, `is_active`, (cast(json_unquote(permissions) as CHAR(500))) );

CREATE INDEX user_management_user_lower_email_org_id_is_active_perms USING BTREE
    ON `user_management_user` ((lower(`email`)), `organization_id`, `is_active`, (cast(json_unquote(permissions) as CHAR(500))));

OTOH, creating an index including a JSON column (permissions) is not possible in MySQL, migration fails with:

django.db.utils.OperationalError: (3152, "JSON column 'permissions' supports indexing only via generated columns on a specified JSON path.")

So it will require some extra work to enable it (eventually we may need to re-think how permissions are stored/checked).

Yeah I can see that in the tests, I'm not sure Django's ORM is giving a good experience here.

@Red-M
Copy link
Author

Red-M commented Oct 3, 2023

Also #3067 is pretty poor form to pull the rug out from this PR.

@Red-M Red-M closed this Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants