-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Install declared deps into a temp venv to infer mappings #252
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love where this is going. As usual I have some opinions on how to solve various parts of this 😅 , feel free to skip or disregard things that are irrelevant to what you want to focus on.
Some specifications and notes:
- Latest available versions of packages are installed.
- Uses the Python version currently running FawltyDeps.
I think this is perfectly fine to start with.
Changes I'm planning to add before this is ready to review:
- Create the cache mechanism (and an option to invalidate it).
I think this can also be done in a later PR if you prefer.
Changes I'm keeping for future PRs:
- This is not user facing yet and I don't plan to make it so in this PR until we have a better integration with other mappings, a better hold on how to set up the venv (versions and constraints) and a better understanding of the implications of installing deps into the user's system in the first place. This will furthermore require interacting with the user for permission so is best kept for another iteration.
I think as long as we have a single --use-pypi
option (default: off) that installs into a temp directory, then that is all the "permission" we should need from the user. The temp directory should not affect the user's system in unintended ways. It is only known to FD, and it goes away when FD is done with it.
If we cache things so that we avoid having to re-install everything on the next FD run, then I think we should not cache the venv itself, but rather cache the Lookup object that we extracted from the venv.
- Related to the above: is it possible to get imports exposed by a package without actually installing said package?
IMHO, given that we're currently fairly invested into importlib_metadata
, I think we do want to install the packages. At least for now. It is possible to extract details from many uninstalled packages by only extracting some of the files (e.g. top_level.txt
), but for some packages (specifically source archives) I suspect there is no general solution that always avoids installation.
So any optimized method we were to develop here would have to copy some of the logic from importlb_metadata
in any case, and would still need provide an ultimate fallback to full installation followed by lookup with importlb_metadata
. I would put this in the category of "later optimization".
- This will install the latest available version of a package, and will not solve for a set of packages versions that respects constraints. This can be tackled partially even without extracting the versions of declared deps but we will need to ultimately extract those for a better setup.
Yeah. I wonder this is a case where we can provide a 95% (or even 99%) solution very easily (use the latest version, and simply assume that import names do not vary between releases), but reaching a 100% Correct™️ solution can be very hard (e.g. an old project with sloppy requirements with constraints that might not even be solvable with today's packages versions).
- After Johan's multiple pyenvs integration has landed, explore the possibility to only install deps not available in the Python env(s).
Yes. As long as the PyPI resolver provides the same interface as LocalPackageLookup
this should not be a problem at all.
- This uses the Python version currently running FawltyDeps. Might want to consider inferring that from the project configuration.
Maybe. Definitely a P3/research-needed kind of issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how all those pieces of puzzle are coming together!
Kudos for an explicit explanation of what you do and will not do in this PR and how to structure further work.
Adding to the conversation sparkled from your description:
(from @jherland )
If we cache things so that we avoid having to re-install everything on the next FD run, then I think we should not cache the venv itself, but rather cache the Lookup object that we extracted from the venv.
Yet another idea is to use a simple format as we would have for user-defined mapping or even use json/toml representation of the Package
object.
Changes I'm planning to add before this is ready to review:
- Create the cache mechanism (and an option to invalidate it).
I think this can also be done in a later PR if you prefer.
👍
- Related to the above: is it possible to get imports exposed by a package without actually installing said package?
IMHO, given that we're currently fairly invested into
importlib_metadata
, I think we do want to install the packages. At least for now. It is possible to extract details from many uninstalled packages by only extracting some of the files (e.g.top_level.txt
), but for some packages (specifically source archives) I suspect there is no general solution that always avoids installation.
I think we should not always install the packages. There may be ones that have lots of other dependencies, which will end up slowing down the process. I think that we should focus on a funnel approach to solving dependencies. And in my opinion, a check on top_level.txt
has a decent place in this funnel. (We are more and more in need of gaining some knowledge on what is "typical", #212) ,
So for (dependencies, imports) pair of sets, we would like to find matches with the following approaches (in order), and if the one above fails, then the unmatched members of sets are "pushed" to the next one:
- Check existing virtual env (provided by the user, or reasonable default, as discussed in Using _multiple_ Python environments to resolve deps -> packages -> imports? #249 )
- Check
top_level.txt
of compressed packages (as done for example in pigar) - Install unmatched dependencies and check again for the match in venv
Somewhere in this, a user-defined mapping should come, but we should discuss the priority of resolution. (An illustration below)
@mknorps: I largely agree with pretty much everything you say. Here was what I had in mind for the 1st approximation of the "stack" of matchers: That said. I need to revise this to make room for the optimized handling of |
It occured to me (and I promise this is relevant for this PR) that we can |
I updated my drawing after incorporating more of @mknorps' ideas, and what we also discussed in the status meeting: |
50fcc6d
to
1443b05
Compare
abcc1b3
to
417dea5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work on this! 😻
I'm especially happy with how much of the code we could reuse without much or any changes.1
Still, I have some reservations about two or three areas detailed in the comments below.
Footnotes
-
There is almost too much reuse in that the original
LocalPackageLookup
name does not really fit anymore when you throw PyPI into the mix... It's pretty much at the point where I'd like a follow-up PR that e.g. pulls out an abstract base class fromLocalPackagesLookup
, general enough to also serve as a base for the identity mapping, as well as for this new PyPI-aware method (which at heart really is aLocalPackageLookup
, I guess, so might want to inherit from that instead...), the point of that refactoring would be to massage the shape of the code into something that starts looking more like the diagrams we've been drawing over the last few days. ↩
tests/sample_projects/file__requirements_undeclared_unused/expected.toml
Outdated
Show resolved
Hide resolved
@jherland I still didn't go through your new comments, but I think this needs to be changed to account for the boolean |
6726893
to
13d4212
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, great work on this! ❤️
Really, I want to emphasize how well you are tackling a complex PR with lots of extra discussion and spin-off issues: From this PR we created #256, and a bunch of other issues (soon to become more PRs) that really help us divide and conquer this big and complex cloud of related issues into smaller manageable chunks that are so much easier to tackle.
I'm eager to learn from this, because I know that in the same situation I would probably spend many more days more or less on my own, to come up with a much bigger PR that would set out to solve so much more than what would be reasonable to put in a single PR, and it would be so much harder to review because of that.
Instead, you have turned this into multiple issues that we can all discuss and solve separately, which (a) makes the code better, and more importantly (b) spreads the knowledge about this part of our program to the whole team. 🙏🏻
- Not user facing yet - No cache mechanism yet - Will install latest available version of a package - Will not solve for a set of packages versions that respect constraints - No possibility to only install deps not available in the Python env - Uses the Python version currently ruuning FawltyDeps. Might want to consider inferring that from the project configuration. Use a context manager to obtain packages from a temp venv
13d4212
to
3042390
Compare
Some specifications and notes:
Changes I'm planning to add before this is ready to review:- Create the cache mechanism (and an option to invalidate it).(not relevant anymore as cache became its own beast in #256).Changes I'm keeping for future PRs:
- This is not user facing yet and I don't plan to make it so in this PR until we have a better integration with other mappings, a better hold on how to set up the venv (versions and constraints) and a better understanding of the implications of installing deps into the user's system in the first place. This will furthermore require interacting with the user for permission so is best kept for another iteration.Ended up adding the option to install deps to settings in this PR.