Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using _multiple_ Python environments to resolve deps -> packages -> imports? #249

Open
2 of 6 tasks
jherland opened this issue Mar 17, 2023 · 3 comments
Open
2 of 6 tasks
Labels
P3 minor: not priorized research-needed

Comments

@jherland
Copy link
Member

jherland commented Mar 17, 2023

The basic question here is: How can we maximize FawltyDeps' chances of finding a relevant Package that provides a useful dep name -> import names mapping, and minimize the cases where we fall back to the identity mapping?

Currently we take one --pyenv option (or default to packages found via sys.path, i.e. the environment where FD is installed).

Ideas/tasks:

  • 1. A natural extension is to accept multiple --pyenv values (like we recently did for --code and deps), to allow multiple environments to be queried. 1 This is being implemented in Prepare for supporting multiple --pyenv options #313 and Support multiple --pyenv options #321.
  • 2. Asking the user to pass in multiple --pyenvs is not very friendly. We could instead walk the directory structure of the project (limited by basepaths and/or --pyenv) and automatically discover all Python environments within, and add them to our analysis. 2 This is being implemented in Teach FawltyDeps to automatically discover Python environments inside the project #326.
  • 3. A comment on Pointing FawltyDeps to a Python environment created by poetry2nix does not work (was: Different name for pypi package and exposed package) #236 suggests also looking at $PYTHONPATH. Experiments reveal that $PYTHONPATH is automatically reflected in sys.path (it is inserted before the system search paths, but after the current directory which shows up at the start of sys.path). As of now, the plan is for --pyenv (and for auto-discovered pyenvs in the above point) to override the use of sys.path, in other words, sys.path is only used as an ultimate fallback when no other Python environments are found, either via --pyenv, or under basepath. The question then becomes whether we want to use sys.path in addition to Python environments from --pyenv/basepath, and - if so - whether this should be automatic, or controlled by a user-visible option.
  • 4. There are other places where Python packages are often installed. All of these might be good places to discover packages that are undeclared because they were part of a user's "ambient" environment:
    • pip install --user places packages in a user-specific directory (e.g. under $HOME/.local/)
    • Poetry places its virtualenvs under $HOME/.cache/pypoetry/virtualenvs/
    • pipx places virtualenvs under $HOME/.local/pipx.
    • Other tools that places virtualenvs or similar inside the user's environment/home?
  • 5. Same goes for system-wide installed packages, whether they are installed by pip, or by the distro.
    • /usr/lib/python?.*/site-packages
    • /usr/local/...
    • /opt/???/
  • 6. Elsewhere?

Where do we stop? How much is too much?

Rather, what are the disadvantages of adding more places to look for packages?

  • What are the chances of encountering an environment that is willfully misleading?
  • What are the consequences of using a misleading environment?

Footnotes

  1. We can fairly straightforwardly extend our resolve_dependencies() to have a stack of multiple LocalPackageLookup objects, one for each environment, and then query them in order until one of them returns a matching Package object. The identity mapping can then be represented as a IdentityPackageLookup object automatically placed at the bottom of this stack. The way LocalPackageLookup is currently written, we delay enumerating packages (i.e. walking the filesystem) until we actually need to look something up. Therefore, as long we put the most relevant virtualenvs near the top of the stack, we can probably avoid enumerating packages for the lower levels of the stack, and we can then have a fairly deep stack without necessarily paying the cost (in terms of enumerating less relevant virtualenvs).

  2. If we auto-discover virtualenvs while walking the project directories, we should consider removing these from the code and deps, as they should probably not be considered part of the project itself, and thus shouldn't be scanned for imports/deps by FD.

@janydoe
Copy link
Contributor

janydoe commented Mar 17, 2023

suggests also looking at $PYTHONPATH. Some research is needed to figure how this interacts with sys.path.

I've just tried and it works. 👍

shellHook = ''
    export PYTHONPATH=${siteEnv}/lib/python3.8/site-packages
'';

@mknorps
Copy link
Collaborator

mknorps commented Mar 17, 2023

--pyenv usage comments

I think that we have established a general pattern of how FawltyDeps works - it has a reasonable defaults and if those defaults are not sufficient we have knobs to adjust them. One example is positional basepath and --code and --deps options.

Following this take a default pattern, we could start with implementing point 2, with a reasonable default, which I think is a virtual env found under basepath plus $PYTHONPATH (point 3, proven to work by @janydoe) and user installation (point 4).

Point 1 from your list is a natural follow up of --code and --deps taking multiple arguments, but may be considered and implemented in parallel of a default behaviour.

Just keep in mind that this will work for one of the case of FawltyDeps usage - when you check yours program, that works locally, but may not have properly declared dependencies. For case, where you dig though some old code and try to figure what is missing before you run expensive computations, this will not be that impactful.

Idea to discuss

On the other hand it would be interesting to think about following work flow:

  1. Take all packages mapping from all your virtual environments
  2. Export them as a file that FawltyDeps can use as mapping (add option to take user mapping from file)
  3. Run FawltyDeps on the unknown project with all the information on packages mapping that you have in your system.

This would be yet another functionality - parallel to extracting imports and dependencies - extracting mapping, so we will have 4 main modules instead of 3. The mapping is a crucial part of matching imports and dependencies so it may be worth considering anyway.

Misleading environments comments

On the misleading environments, I think that we may encounter this behaviour in cases like google api clients, where many of them exports library google. If in envA and envB we have different libraries exposing the same google, we will end up with unmatched dependency. How often does this case happen, is hard to guess. We may have some insight after conducting experiment #212.

@jherland
Copy link
Member Author

The remainder of this issue remains open, but will not be tackled in time for the current Mapping strategy milestone.

@Nour-Mws Nour-Mws added P3 minor: not priorized and removed P2 major: an upcoming release labels Aug 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 minor: not priorized research-needed
Projects
None yet
Development

No branches or pull requests

4 participants