Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

package: Introduce a list of package resolvers #268

Merged
merged 6 commits into from
Mar 28, 2023

Conversation

jherland
Copy link
Member

This does not change current behavior, but does a preliminary
refactoring on the way towards having multiple package-to-import
resolvers as discussed in #256 and associated issues.

Introduce a BasePackageResolver abstract base class to define the
interface for package resolvers, and provide, for now, two subclasses:

  • LocalPackageResolver (renamed from LocalPackageLookup)
  • IdentityMapping (refactored from resolve_dependencies())

Also refactor resolved_dependencies() to operate on a list of
BasePackageResolver objects: For each dependency name we go through the
list of resolvers until a resolver returns a Package object for this
name. We then add this Package to the returned mapping.

For now this list of resolvers is hardcoded to exactly two entries, the
LocalPackageResolver we were already using, and the IdentityMapping that
we were effectively falling back on for packages not found in the former.

Ideas for future work that build on this:

  • Introduce more/different BasePackageResolver subclasses to do
    different kinds of package resolving. E.g. user-defined mapping from a
    config file. Resolving by looking up packages remotely on PyPI, etc.
  • Make the list of resolvers user-configurable instead of hardcoded.
  • Properly handle the case where none of the resolver find a package
    mapping, i.e. unmapped depenedencies.

Copy link
Collaborator

@mknorps mknorps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this refactor 😍
The structure of how to add the resolvers and stack them is clean and visible now.

What we should still consider is a special place of identity mapping inside Package object. I think we should not include add_identity_import and identity_mapping in the Package object, but rather move it to IdentityMapping. So conceptually, Package is independent of any resolver.

@jherland
Copy link
Member Author

What we should still consider is a special place of identity mapping inside Package object. I think we should not include add_identity_import and identity_mapping in the Package object, but rather move it to IdentityMapping. So conceptually, Package is independent of any resolver.

Fully agreed. I'll add a commit on top to do just that.

@jherland
Copy link
Member Author

What we should still consider is a special place of identity mapping inside Package object. I think we should not include add_identity_import and identity_mapping in the Package object, but rather move it to IdentityMapping. So conceptually, Package is independent of any resolver.

Fully agreed. I'll add a commit on top to do just that.

Added in e16c2d0

@jherland jherland requested a review from mknorps March 24, 2023 11:37
Copy link
Collaborator

@mknorps mknorps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Copy link
Collaborator

@Nour-Mws Nour-Mws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only read enough of this PR to know that this works on a per-dependency basis rather than on a collection of dependencies.
My question is: would we now need to create a new venv for each and every dependency in the "install_deps" mapping if these don't get matched by previous mappings?

@jherland
Copy link
Member Author

I only read enough of this PR to know that this works on a per-dependency basis rather than on a collection of dependencies.
My question is: would we now need to create a new venv for each and every dependency in the "install_deps" mapping if these don't get matched by previous mappings?

Good catch! Yes, we will need to change the API here to have each resolver handle multiple deps at the same time. I'll attempt to do this when rebasing this PR on top of #252.

One complication I'm seeing: when handling multiple deps (A, B, C) and one of them fails (e.g. C does not exist on PyPI), the resolver should to try to handle this failure somehow (to let A and B resolve successfully).

This helps debugging as we can now see exactly what paths are present in
the Settings object for the current sample_project being tested.
This does not change current behavior, but does a preliminary
refactoring on the way towards having multiple package-to-import
resolvers as discussed in #256 and associated issues.

Introduce a BasePackageResolver abstract base class to define the
interface for package resolvers, and provide, for now, two subclasses:

- LocalPackageResolver (renamed from LocalPackageLookup)
- IdentityMapping (refactored from resolve_dependencies())

Also refactor resolved_dependencies() to operate on a list of
BasePackageResolver objects: For each dependency name we go through the
list of resolvers until a resolver returns a Package object for this
name. We then add this Package to the returned mapping.

For now this list of resolvers is hardcoded to exactly two entries, the
LocalPackageResolver we were already using, and the IdentityMapping that
we were effectively falling back on for packages not found in the former.

Ideas for future work that build on this:
- Introduce more/different BasePackageResolver subclasses to do
  different kinds of package resolving. E.g. user-defined mapping from a
  config file. Resolving by looking up packages remotely on PyPI, etc.
- Make the list of resolvers user-configurable instead of hardcoded.
- Properly handle the case where none of the resolver find a package
  mapping, i.e. _unmapped_ depenedencies.
@jherland jherland force-pushed the jherland/introduce-package-resolver-stack branch 2 times, most recently from 868cac2 to 32b6e23 Compare March 24, 2023 23:34
@jherland
Copy link
Member Author

jherland commented Mar 24, 2023

I've now rebased on top of #252, which has brought in a bunch of new changes. Hence I'm tagging this for re-review. The good news is that this brings in more of the changes we've discussed in #256 and associated issues. Here are the commits with some short comments to guide your review:

  1. ab33919 test_sample_projects: Print experiment settings
    • Minor quality-of-life thing that I ended up adding while debugging test failures
  2. c911378 packages: Introduce a list of package resolvers
  3. bab45cd packages: Move id mapping logic into IdentityMapping class
    • The refactoring that @mknorps proposed in her review. Not much has changed here, except for quoting the import name in the INFO log message that is printed when we use the identity mapping.
  4. 1aff0ae packages: Change BasePackageResolver API: resolve all names in one call
    • The API change that I proposed in response to @Nour-Mws's comment. I'm fairly happy with how this turned out.
  5. c955096 packages: Rewrite temp_installed_requirements() as a "proper" resolver
  6. 32b6e23 packages: Use both local + TemporaryPipInstallResolver at the same time

Copy link
Collaborator

@Nour-Mws Nour-Mws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and it was a great read :)
Feel free to resolve conversations.

@mknorps
Copy link
Collaborator

mknorps commented Mar 28, 2023

I really like this refactor and I am eagerly waiting for it to be merged 💯

One thing I noticed in rereading the changes is that there are no tests for TemporaryPipInstallResolver. You may consider adding it in the follow-up PR, or just create some dummy TODO test in test_packages.py, so it does not skip our attention. This can also be a good first issue :)

Makes little sense to keep the id mapping logic inside the generic
Package class when we now have the IdentityMapping class as well.

Suggested-by: Maria Knorps <[email protected]>
With the introduction of a resolver to temporarily install packages
remotely (from PyPI), we want to resolve all packages in one go instead
of resolving one package at a time (as this allows for a single `pip
install` command instead of one command per package).

Change the BasePackageResolver.lookup_package() method into a
BasePackageResolver.lookup_packages() method which converts a set of
package names into a dictionary mapping a subset of those names (i.e.
the names that could be successfully resolved by this resolver) into
their corresponding Package objects.

The resolve_dependencies() function is changed accordingly: Instead of
resolving one package at a time (iterating through the available
resolvers in order), we now instead loop over the available resolvers
_first_, passing all package names to the first resolver in the list,
and keeping track of which names was _not_ resolved by that resolver.
We then pass the unresolved names on to the next resolver in the list,
and so on, until either all names have been resolved (at which point we
don't need to consult any more resolvers), or we have reached the end of
the resolvers list.

Currently, the last resolver in the hardcoded list of resolvers is the
IdentityMapping, which guarantess that all the given package names will
get resolved. (This will change in the future.)
Refactor temp_installed_requirements() into a new class:
TemporaryPipInstallResolver, a BasePackageResolver subclass that can be
used directly from resolve_dependencies() and thus simplifies the logic
in resolve_dependencies(): Instead of having to run the entire package
resolving in the (maybe) context of temp_installed_requirements() we can
now instead simply choose between TemporaryPipInstallResolver or
LocalPackageResolver based on the value of the install_deps argument.
Until now, the `install_deps` argument to resolve_dependencies() has
been used to choose between LocalPackageResolver (i.e. resolving via a
given --pyenv, or sys.path) and a TemporaryPipInstallResolver.

With this, we will now always use a LocalPackageResolver _first_, and
then - if install_deps is given - use TemporaryPipInstallResolver.
eventually falling back on IdentityMapping as a last resort.
@jherland jherland force-pushed the jherland/introduce-package-resolver-stack branch from 79e31a2 to 57e482f Compare March 28, 2023 09:33
@jherland jherland merged commit e83e3d1 into main Mar 28, 2023
@jherland jherland deleted the jherland/introduce-package-resolver-stack branch March 28, 2023 09:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Resolve packages from the Python environment before installing dependencies
3 participants