Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install declared deps into a temp venv to infer mappings #252

Merged
merged 3 commits into from
Mar 24, 2023

Conversation

Nour-Mws
Copy link
Collaborator

@Nour-Mws Nour-Mws commented Mar 20, 2023

Some specifications and notes:

  • Latest available versions of packages are installed.
  • This uses the Python version currently running FawltyDeps.
  • The "analysis" object test is basically the sample projects test and might need to be reconsidered/integrated into the refactor of Maria's hypothesis testing PR.

Changes I'm planning to add before this is ready to review:
- Create the cache mechanism (and an option to invalidate it). (not relevant anymore as cache became its own beast in #256).

Changes I'm keeping for future PRs:
- This is not user facing yet and I don't plan to make it so in this PR until we have a better integration with other mappings, a better hold on how to set up the venv (versions and constraints) and a better understanding of the implications of installing deps into the user's system in the first place. This will furthermore require interacting with the user for permission so is best kept for another iteration. Ended up adding the option to install deps to settings in this PR.

  • Related to the above: is it possible to get imports exposed by a package without actually installing said package?
  • This will install the latest available version of a package, and will not solve for a set of packages versions that respects constraints. This can be tackled partially even without extracting the versions of declared deps but we will need to ultimately extract those for a better setup.
  • After Johan's multiple pyenvs integration has landed, explore the possibility to only install deps not available in the Python env(s).
  • This uses the Python version currently running FawltyDeps. Might want to consider inferring that from the project configuration.

Copy link
Member

@jherland jherland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love where this is going. As usual I have some opinions on how to solve various parts of this 😅 , feel free to skip or disregard things that are irrelevant to what you want to focus on.

Some specifications and notes:

  • Latest available versions of packages are installed.
  • Uses the Python version currently running FawltyDeps.

I think this is perfectly fine to start with.

Changes I'm planning to add before this is ready to review:

  • Create the cache mechanism (and an option to invalidate it).

I think this can also be done in a later PR if you prefer.

Changes I'm keeping for future PRs:

  • This is not user facing yet and I don't plan to make it so in this PR until we have a better integration with other mappings, a better hold on how to set up the venv (versions and constraints) and a better understanding of the implications of installing deps into the user's system in the first place. This will furthermore require interacting with the user for permission so is best kept for another iteration.

I think as long as we have a single --use-pypi option (default: off) that installs into a temp directory, then that is all the "permission" we should need from the user. The temp directory should not affect the user's system in unintended ways. It is only known to FD, and it goes away when FD is done with it.

If we cache things so that we avoid having to re-install everything on the next FD run, then I think we should not cache the venv itself, but rather cache the Lookup object that we extracted from the venv.

  • Related to the above: is it possible to get imports exposed by a package without actually installing said package?

IMHO, given that we're currently fairly invested into importlib_metadata, I think we do want to install the packages. At least for now. It is possible to extract details from many uninstalled packages by only extracting some of the files (e.g. top_level.txt), but for some packages (specifically source archives) I suspect there is no general solution that always avoids installation.

So any optimized method we were to develop here would have to copy some of the logic from importlb_metadata in any case, and would still need provide an ultimate fallback to full installation followed by lookup with importlb_metadata. I would put this in the category of "later optimization".

  • This will install the latest available version of a package, and will not solve for a set of packages versions that respects constraints. This can be tackled partially even without extracting the versions of declared deps but we will need to ultimately extract those for a better setup.

Yeah. I wonder this is a case where we can provide a 95% (or even 99%) solution very easily (use the latest version, and simply assume that import names do not vary between releases), but reaching a 100% Correct™️ solution can be very hard (e.g. an old project with sloppy requirements with constraints that might not even be solvable with today's packages versions).

  • After Johan's multiple pyenvs integration has landed, explore the possibility to only install deps not available in the Python env(s).

Yes. As long as the PyPI resolver provides the same interface as LocalPackageLookup this should not be a problem at all.

  • This uses the Python version currently running FawltyDeps. Might want to consider inferring that from the project configuration.

Maybe. Definitely a P3/research-needed kind of issue.

Copy link
Collaborator

@mknorps mknorps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how all those pieces of puzzle are coming together!
Kudos for an explicit explanation of what you do and will not do in this PR and how to structure further work.

Adding to the conversation sparkled from your description:
(from @jherland )

If we cache things so that we avoid having to re-install everything on the next FD run, then I think we should not cache the venv itself, but rather cache the Lookup object that we extracted from the venv.

Yet another idea is to use a simple format as we would have for user-defined mapping or even use json/toml representation of the Package object.

Changes I'm planning to add before this is ready to review:

  • Create the cache mechanism (and an option to invalidate it).

I think this can also be done in a later PR if you prefer.

👍

  • Related to the above: is it possible to get imports exposed by a package without actually installing said package?

IMHO, given that we're currently fairly invested into importlib_metadata, I think we do want to install the packages. At least for now. It is possible to extract details from many uninstalled packages by only extracting some of the files (e.g. top_level.txt), but for some packages (specifically source archives) I suspect there is no general solution that always avoids installation.

I think we should not always install the packages. There may be ones that have lots of other dependencies, which will end up slowing down the process. I think that we should focus on a funnel approach to solving dependencies. And in my opinion, a check on top_level.txt has a decent place in this funnel. (We are more and more in need of gaining some knowledge on what is "typical", #212) ,
So for (dependencies, imports) pair of sets, we would like to find matches with the following approaches (in order), and if the one above fails, then the unmatched members of sets are "pushed" to the next one:

  1. Check existing virtual env (provided by the user, or reasonable default, as discussed in Using _multiple_ Python environments to resolve deps -> packages -> imports? #249 )
  2. Check top_level.txt of compressed packages (as done for example in pigar)
  3. Install unmatched dependencies and check again for the match in venv

Somewhere in this, a user-defined mapping should come, but we should discuss the priority of resolution. (An illustration below)
Paper Tweag 5

@jherland
Copy link
Member

jherland commented Mar 21, 2023

@mknorps: I largely agree with pretty much everything you say. Here was what I had in mind for the 1st approximation of the "stack" of matchers:
fd_matching 1

That said. I need to revise this to make room for the optimized handling of top_level.txt as well as where to place the cached mapping stored from a previous run... 🤔

@jherland
Copy link
Member

I think we should not always install the packages. There may be ones that have lots of other dependencies, which will end up slowing down the process.

It occured to me (and I promise this is relevant for this PR) that we can pip install with --no-deps. This should avoid unnecessary slowdowns, and should not (I hope) affect FD negatively, as we are only interested in looking at the direct dependencies.

@jherland
Copy link
Member

I updated my drawing after incorporating more of @mknorps' ideas, and what we also discussed in the status meeting:

fd_matching 2

@mknorps
Copy link
Collaborator

mknorps commented Mar 21, 2023

@jherland I moved the discussion on the mapping resolution to the issue of beautiful round number #256

@Nour-Mws
Copy link
Collaborator Author

@jherland

It occured to me (and I promise this is relevant for this PR) that we can pip install with --no-deps. This should avoid unnecessary slowdowns, and should not (I hope) affect FD negatively, as we are only interested in looking at the direct dependencies.

I added this in b618b92.

@Nour-Mws Nour-Mws marked this pull request as ready for review March 22, 2023 19:08
@Nour-Mws
Copy link
Collaborator Author

@mknorps and @jherland I believe this is ready for review now. I think I addressed all conversations that are relevant to the current PR and not moved to #256.

@Nour-Mws Nour-Mws requested review from jherland and mknorps March 22, 2023 19:09
@Nour-Mws Nour-Mws force-pushed the nour/pull-deps branch 2 times, most recently from abcc1b3 to 417dea5 Compare March 22, 2023 20:00
@Nour-Mws Nour-Mws changed the title Pull declared deps into a local venv to infer mappings Install declared deps into a temp venv to infer mappings Mar 22, 2023
Copy link
Member

@jherland jherland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this! 😻

I'm especially happy with how much of the code we could reuse without much or any changes.1

Still, I have some reservations about two or three areas detailed in the comments below.

Footnotes

  1. There is almost too much reuse in that the original LocalPackageLookup name does not really fit anymore when you throw PyPI into the mix... It's pretty much at the point where I'd like a follow-up PR that e.g. pulls out an abstract base class from LocalPackagesLookup, general enough to also serve as a base for the identity mapping, as well as for this new PyPI-aware method (which at heart really is a LocalPackageLookup, I guess, so might want to inherit from that instead...), the point of that refactoring would be to massage the shape of the code into something that starts looking more like the diagrams we've been drawing over the last few days.

@Nour-Mws
Copy link
Collaborator Author

@jherland I still didn't go through your new comments, but I think this needs to be changed to account for the boolean --install-deps when it's set in toml or env variables. I'll go through your comments then fix that.

Copy link
Member

@jherland jherland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, great work on this! ❤️

Really, I want to emphasize how well you are tackling a complex PR with lots of extra discussion and spin-off issues: From this PR we created #256, and a bunch of other issues (soon to become more PRs) that really help us divide and conquer this big and complex cloud of related issues into smaller manageable chunks that are so much easier to tackle.

I'm eager to learn from this, because I know that in the same situation I would probably spend many more days more or less on my own, to come up with a much bigger PR that would set out to solve so much more than what would be reasonable to put in a single PR, and it would be so much harder to review because of that.

Instead, you have turned this into multiple issues that we can all discuss and solve separately, which (a) makes the code better, and more importantly (b) spreads the knowledge about this part of our program to the whole team. 🙏🏻

- Not user facing yet
- No cache mechanism yet
- Will install latest available version of a package
- Will not solve for a set of packages versions that respect constraints
- No possibility to only install deps not available in the Python env
- Uses the Python version currently ruuning FawltyDeps. Might want to
  consider inferring that from the project configuration.

Use a context manager to obtain packages from a temp venv
@Nour-Mws Nour-Mws merged commit 32b509c into main Mar 24, 2023
@Nour-Mws Nour-Mws deleted the nour/pull-deps branch March 24, 2023 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants