-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving search performance while using --full flag #8
Comments
Awesome, so we generally create multiple PRs that all point to one issue as the source of the "why", which you've provided. Leaving the individual PRs free to talk about the "how." One of the downsides is that our Specs repo change's pretty often. But if you're using search to find things at the same time then this makes sense. It's a shame that we can't offer the cocoapods.org search for your own specs repo. Otherwise I was going to say you could get similar performance gains + way more flexibility by switching to using our API. |
Hi @orta, I've also thought about the frequent refresh rate of the specs repo. Even though it won't bother user's pod install/update speed, making index update on every such command would be unnecessary for users that do not use command line search much. What is the index update policy for the cocoapods.org search? I've seen an issue on the repo about updating index periodically using a time interval, however it was not specifying current policy. We should think more about the index update policy. However, we can change index format now since it seems like it does not have a similar drawback. So it would be better to split changes to different PRs. |
The search can get incremental updates from trunk via web hooks for every pod, so it has a very different problem ( and a completely unique index - I expect, I don't know the details. ) I wonder if ideal answer is that we tie the index to the commit SHA, don't do a repo update but at the end of the results we check for if the HEAD SHA on the repo remote is different and if so, say " you might want to run |
Relating search index with the commit SHA would be very reasonable. Your suggestion works for most people, so we can go for it. What I wonder is if a more ideal scenario is possible. Would it be better to update search index incrementally using sth like Here is a sample output for git diff on master specs repo:
However, if it feels too hackie, your suggestion also fits great. |
That feels like the perfect answer @manuyavuz |
Great then. I need to mature solution to index update problem, so I will submit two different PRs, one for the index format(which will hopefully be done today), other for updating the index (soon). Thanks @orta! |
I’m loving this, superb work @manuyavuz! 👏 ✨ |
Hey @alloy, thanks! And there is more to come soon 😋 |
Hey, I have a question. What's the best way to get the name of the pod from the following string?
I don't want to do hard-coded splitting. I'm sure there is a convenient method for this inside somewhere in the projects :) cc @segiddins |
Where's the string coming from? I'm pretty sure Source has all of the necessary APIs, but I can't promise anything. |
I have this inside sources_manager.rb file. I also have the source object. |
But where are you getting that path from? Ideally, the source could yield the version and name when iterating, but if you're reading the spec anyways, you could read the name off of it. |
Oh sorry I misunderstood. I get it from the output of |
I'd add a method on source that can do the splitting, then. |
Umm, ok for me. Is there any sample that currently performs a similar kind of splitting inside the project? For instance which parts can I assume that it won't change? I see that 'Specs' string is hardcoded inside |
The filesystem layout is stable. Will search still work without a spec repo being inside a git repo? (Iirc, there is for instance a svn spec repo plugin.) |
Good question. I was planning to stick to the behavior inside |
It’s true that the built-in code for updating spec-repos only supports git, but here’s that plugin I was referring to that makes it work for svn as well https://github.com/clarkda/cocoapods-repo-svn. The goal should generally always be to have it work as scm-agnostic as possible. So will it still work, but not support partial updates? If so, that’s fine, someone like @clarkda can then add partial update support if they need it. |
I got the point. Then, we may still perform index updates for non-git sources while performing search operation. This way, we still gain from performance on git specs (which is the default), and |
I think that’s a fine trade-off, as other spec repos will probably much smaller than the master spec repo anyways 👍 |
Actually they will still gain from speed up resulted from the new index format. However, there will be a re-index for non-git sources whenever search command is run. Or maybe we could apply @orta's suggestion and perform a search on current search index and print a message directing to |
Nah, it doesn’t sound like the added complexity is worth it, simpler is better 👍 |
Hi again, I'm nearly done but need help for some decisions to be taken. If a user runs If a user runs According to suggestions, I will complete implementation and update the PRs. |
I don't think pod search should update the repos |
What about doing that on the first ‘search’ and print out what it’s doing? This has the added benefit of never spending time on updating the index if the user never actually uses search.
I agree with @segiddins. |
It sounds good.
|
Sounds good 👍 |
I've completed general implementation and updated PRs CocoaPods/CocoaPods#4249 and CocoaPods/Core#265. Need some thinking for specs. Feedbacks are welcome! |
I think we can close the issue because all changes have been merged. |
Hi,
I like this
pod search --full
command very much, however i think it has performance drawbacks which makes me hesitate while using it.First of all, it makes an update index operation whenever it's called, even if an index was generated before. This makes the command very slow to operate.
Secondly, it generates a hash index such that keys are the pod names and values are strings which are created by concatenating podspec information such as summary, description and authors names. While searching for a given query, it traverses whole these value strings and perform a
gsub()
operation on them.I have the following proposal to improve search speed for full text search scenario:
git pull --ff-only
). Operation should perform as a subprocess to not to block user forpod install
andpod update
commands.gsub()
on long strings repeatedly. We need to check query with keys only.I've actually implemented proposed scenario, and made some benchmarks. Results are as follows (_default_ means current implementation, _fast_ means proposed implementation):
Generating Index From Scratch
There is no important difference on indexing performance.
Searching 'afnetworking' Query
Here the proposed method offers a huge performance gain. This gain results from two improvements:
To measure the effects individually, I run one more benchmark, which disables default implementation's update index policy. In this case, it just gets index from the file and performs search on it. Here are the results:
I think the gain of the new index format is pretty obvious.
Conclusion
Proposed method can remarkably improve search performance for full text search scenario. I will generate a PR containing the implementation, but I have a concern for this. Change contains modifications on three classes each belonging to different repositories (CocoaPods/CocoaPods, CocoaPods/Core, CocoaPods/cocoapods-search). How do you perform PR reviews for these cases? Should I create PR's for each repo and give links to each other? Or are there any other suitable methods?
Thanks so much for giving your time to this long issue description!
The text was updated successfully, but these errors were encountered: