-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization of labels handling in issue_search #26460
Conversation
Rewrote using new Slices.Sort([]int64) & Compact([]int64) functions from Go 1.21. |
slices.Sort(labelQuerySlice) | ||
labelQuerySlice = slices.Compact(labelQuerySlice) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use code.gitea.io/gitea/modules/container.Set
. Then you don't need to sort and compact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review ! However, I used container.Set in my previous iteration and the code was less readable and the output's ordering was random (container.Set uses map).
I still use it in models/issues/issue_search.go where its Add()
method and the on-the-fly de-duplication are quite handy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't see that the output is not just used in the backend. Then you could just use the set with slices.Sort
at the end but that is only better if labelQuerySlice
is very large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think KN4CK3R 's suggestion is right. Managed to do some rewriting: #31497
This pull-request doesn't seem to draw interest so I guess I'll be closing it in a few weeks. However, while not a security issue strictly speaking, I believe it can become a problem on loosely configured instances, and even help for a DoS. For example, on the quite-strictly-configured try.gitea.io instance with a limit at ~60 joins (sqlite ?) :
I hope you'll review it once more, and even if not, thanks for this awesome forge. |
Sorry missed this one. It seems interesting. I will take a look at next week. |
Sorry, this PR got lost in the backlog. I'm currently going through the list of PRs still open and looking at getting them in. |
This PR enhances the labels handling in issue_search by optimizing the SQL query and de-duplicate the IDs when generating the query string. --------- Co-authored-by: techknowlogick <[email protected]>
But it seems that there are still some nits unresolved, including @KN4CK3R 's concern. #26460 (review) |
-> Refactor issue label selection #31497 |
* giteaofficial/main: Refactor issue label selection (go-gitea#31497) Refactor dropzone (go-gitea#31482) [skip ci] Updated translations via Crowdin Optimization of labels handling in issue_search (go-gitea#26460) use correct l10n string (go-gitea#31487) Fix overflow menu flickering on mobile (go-gitea#31484)
@xlii-chl I saw that you said "weird ? The PR was sleeping for a year". Since the hard-fork community is quite hostility to me, I could only answer your question here. Why this PR becomes stale?Because the review suggestion is not resolved and you didn't respond for long time, and the code style doesn't quite match Golang standard. Indeed it needs more work. If you could answer the review suggestion (either apply it or explain why not doing that), it could move on easily. What happened to the hard-fork?Well, that's a long story, in short:
And here are why many of theirs PRs to Gitea were not merged mainly because there are bugs and they do not respond. Let's see why some of them are not merged (I only picked some my reviews): Really suggest everyone to take a deep look ........
|
This PR optimizes the SQL query and de-duplicate the labels' ids when generating the query string, on the issue page. <hr/> ### Background Some time ago, BingBot and some other crawlers have been putting my instance on its knees with requests containing a lot of label ids, like this one : ``` [07/Aug/2023:11:28:37 +0200] "GET /Dolibarr/sendrecurringinvoicebymail/issues?q=&type=all&sort=&state=closed&labels=1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c2%2c10%2c2%2c1%2c1%2c10%2c10%2c7%2c6%2c10%2c10%2c3%2c2%2c1%2c5%2c10%2c1%2c6%2c2%2c7%2c3%2c7%2c6%2c10%2c1%2c10%2c1%2c1%2c7%2c7%2c1%2c1%2c1%2c1%2c10%2c10%2c1%2c2%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c2%2c1%2c12%2c6%2c6%2c10&milestone=0&project=-1&poster=0 HTTP/1.1" 499 0 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36" ``` Since each of the label ids implies a join, it grows exponentially expensive for the database engine (at least on PostgreSQL but SQLite suffers a little too). Thus, this PR proposes two enhancements: * rewrite the database query to use only one squashed condition, * deduplicate the label ids when generating the URL. ### Performance comparison Here are some timings on Postgresql-backed, Forgejo 7.0.4 instances : ```sh $ time curl -s -o /dev/null "http://localhost:3000/toto/tata/issues?q=&type=all&sort=&labels=19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25&state=open&milestone=0&project=0&assignee=0&poster=0" real 0m10,491s user 0m0,017s sys 0m0,008s ``` ...and with the patch: ```sh $ time curl -s -o /dev/null "http://localhost:3000/toto/tata/issues?q=&type=all&sort=&labels=19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25&state=open&milestone=0&project=0&assignee=0&poster=0" real 0m0,094s user 0m0,012s sys 0m0,013s ``` ### Annex This issue was originally proposed to [Gitea](go-gitea/gitea#26460) but didn't get much attention, and I switched to Forgejo in the meantime :) Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/4228 Reviewed-by: Earl Warren <[email protected]> Co-authored-by: Chl <[email protected]> Co-committed-by: Chl <[email protected]>
Let's not backport this since it's not a bug fix. |
This PR enhances the labels handling in issue_search by optimizing the SQL query and de-duplicating the ids when generating the query string.
Those patches are backportable to v1.20.2 as is.
Background
For some time now, BingBot and some other crawlers have been putting my Gitea instance on its knees with requests containing a lot of label ids, like this one :
Since each of the label ids implies a join, it grows exponentially expensive for the database engine (at least on PostgreSQL but SQLite suffers a little too).
Thus this PR proposes two enhancements:
This is my first try at Golang so don't hesitate to point mistakes or inelegant code.
As a side note, I unfortunately don't have the start of this snowball in my webserver logs anymore, and couldn't really reproduce it (the labels in the above project don't have scopes, and the label id n°1 that we can see in the query string is linked to a deleted repository (but its labels still exist in database apparently)).