-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prepare for supporting multiple --pyenv
options
#313
Conversation
70042cd
to
f337dd8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot of good stuff there! I see you had to do a lot of refactor to arrive at a point where it is possible to implement multiple --pyenv
option.
I would probably be happier to see it split to 2 or three separate PRs and the thing that keeps me focused in the current setup is a logical and perfectly described single commits ❤️.
So I will just start with the first few commit that are in the preparation phase of the multiple --pyenv
option implementation.
First three are straightforward and I have no comments.
11ef514 and 94ca571 are introduce the better separation of the environment that may be used in tests. Here I wonder if we should not use this feature also for sample and real-world projects integration tests.
With 8ace2c1 I do not agree.
The only purpose of the DependenciesMapping enum is to differentiate the
various types of mappings stored inside a Package object,
Yes. And this is an important purpose. If you want to know what different mappings we use, you check DependenciesMapping
object. If you want to yet another mapping, you would do it here. In this PR you exchange a strict definition provided in a type to a string-based description. This is not a good idea in my opinion as for debugging you need to parse back string to the correct mapping type and for refactor you do not have a support from a typechecker.
I will post some ideas how to tackle the need to have a location of a mapping in the next comment :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To have a way to include location in the dependency mapping, we could use an object with location included for example:
@dataclass(eq=True, frozen=True, order=True)
class DependencyMappingEntry:
mapping_type: DependencyMapping
source: Location
items: Set[str]
and then for a package:
Package(
"foo",
{
DependencyMappingEntry(DependenciesMapping.LOCAL_ENV, Location("path1"), {"bar", "baz"},
DependencyMappingEntry(DependenciesMapping.LOCAL_ENV, Location("path2"), {"blob"},
}
)
Second idea. Given that:
However, the way we are currently structuring our resolver stack, we
will never mix mappings from different resolvers inside a single
Package object.
we could add resolved_with
to package object:
class Package:
"""Encapsulate an installable Python package.
This encapsulates the mapping between a package name (i.e. something you can
pass to `pip install`) and the import names that it provides once it is
installed.
"""
package_name: str
mappings: Dict[Location, Set[str]] = field(default_factory=dict) # here instead of Location we can have a string representing path
import_names: Set[str] = field(default_factory=set)
resolved_with: Optional[DependenciesMapping] = None
It looks like I should probably split the PR right before ec726e6 (LocalPackageResolver: Accept multiple pyenvs), so that we can properly discuss and land the preparatory patches before introducing the actual feature (multiple
That's a good idea. We already do isolate these sample/real project tests from the current environment by using the
Ok, we can certainly revisit the design here.
I'm not sure that I also wanted to keep the option open for future resolvers to do things differently (although I could easily be persuaded to leave that as a problem for future-us, and rather solve the problem that is in front of us right now).
I think I like this idea better. But I'd still want to get rid of the Also, I'm growing slightly weary of the duplication between In other words: class Package:
package_name: str
resolved_with: Type[BasePackageResolver] # The serialization of this in JSON output would be the class' .__name__
import_names: Union[Set[str], Dict[str, Something]] # Set-like collection of import names with resolver-specific metadata. (The only limitation of the Resolvers such as class IdentityMapping(BasePackageResolver):
def lookup_package(package_name: str) -> Package:
return Package(
package_name,
resolved_with=IdentityMapping,
import_names={Package.normalize_name(package_name)},
) But the more complex resolvers could themselves choose how/what metadata to attach to each import name... |
I like this idea! Theoretically a mapping is a more general concept and I can imagine that a single mapping may be resolved with several resolvers, but this is a very unlikely abstraction. I think changing from Consider everything written below (along with your suggestion to get rid of duplication between In fact, the condition that we may use only one
makes me think that we could go one step further and not mutate As for the deduplication of import_names: Set[str] would probably be the best option. But if we need to take into account that for each class Package:
package_name: str
resolved_with: Type[BasePackageResolver] # The serialization of this in JSON output would be the class' .__name__
mappings: Set[Tuple[str,str]] # first element of a tuple is import name and second is the description of source
@property
def import_names(self):
return {name for name,_ in self.mappings}
|
I think we're converging on the same (or very similar) designs here 😄. As of right now, this is the ExtraImportInfo = Union[None, Location, str]
class Package:
package_name: str
resolved_with: Type["BasePackageResolver"]
import_names: Dict[str, ExtraImportInfo] = field(default_factory=dict) IMHO, an Examples of Package(
package_name="foo",
resolved_with=UserDefinedMapping,
import_names={
"foo": Location(Path("mappings.toml")),
"bar": Location(Path("pyproject.toml")),
"baz": Location(Path("pyproject.toml")),
},
)
Package(
package_name="foo",
resolved_with=LocalPackageResolver,
import_names={
"foo": Location(Path(".venv/lib/python3.7/site-packages")),
"bar": Location(Path(".venv/lib/python3.7/site-packages")),
"baz": Location(Path("__pypackages__/3.10/lib")),
},
)
Package(
package_name="foo",
resolved_with=IdentityMapping,
import_names={
"foo": None,
},
) I'm only 95% happy with this design, though, as there is often duplication between the Maybe separating the meta-information into a separate member is cleaner? This also allows each resolver more control over how to organize it (for better or worse?): PackageDebugInfo = Union[None, ...]
class Package:
package_name: str
resolved_with: Type["BasePackageResolver"]
import_names: Set[str] = field(default_factory=set)
debug_info: PackageDebugInfo = None
# Examples:
Package(
package_name="foo",
resolved_with=UserDefinedMapping,
import_names={"foo", "bar", "baz"},
debug_info={
Location(Path("mappings.toml")): {"foo"},
"from configured settings": {"bar", "baz"},
},
)
Package(
package_name="foo",
resolved_with=LocalPackageResolver,
import_names={"foo", "bar", "baz"},
debug_info={
Location(Path(".venv/lib/python3.7/site-packages")): {"foo", "bar"},
Location(Path("__pypackages__/3.10/lib")): {"baz"},
},
)
Package(
package_name="foo",
resolved_with=IdentityMapping,
import_names={"foo"},
debug_info=None,
) What do you think? Otherwise, I agree with your point on making |
5e5904a
to
d03225a
Compare
Finished refactoring for now, and split up the PR as discussed above. Here is a summary of the changes: Initial commits, unchanged:
Refactored/changed/new commits:
Dropped commits, these will be moved to a new PR:
|
--pyenv
options--pyenv
options
d03225a
to
1ade7d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for splitting the PR :)
This one is ready to 🚀 . I really like the place it converged to 💟
I put some small comments, that are non-blocking.
Maybe separating the meta-information into a separate member is cleaner? This also allows each resolver more control over how to organize it (for better or worse?):
PackageDebugInfo = Union[None, ...] class Package: package_name: str resolved_with: Type["BasePackageResolver"] import_names: Set[str] = field(default_factory=set) debug_info: PackageDebugInfo = None # Examples: Package( package_name="foo", resolved_with=UserDefinedMapping, import_names={"foo", "bar", "baz"}, debug_info={ Location(Path("mappings.toml")): {"foo"}, "from configured settings": {"bar", "baz"}, }, ) Package( package_name="foo", resolved_with=LocalPackageResolver, import_names={"foo", "bar", "baz"}, debug_info={ Location(Path(".venv/lib/python3.7/site-packages")): {"foo", "bar"}, Location(Path("__pypackages__/3.10/lib")): {"baz"}, }, ) Package( package_name="foo", resolved_with=IdentityMapping, import_names={"foo"}, debug_info=None, )What do you think?
I really like this design. Keeping the core functionality and debug info separate makes the code cleaner, as for the usage in comparing imports and dependencies package_name
and import_names
are the most important fields.
The fake_venv() fixture is about to be put into service to attempt further isolation of tests that use a default LocalPackageResolver (i.e. one that looks at sys.path) from the virtualenv in which the test happens to be running. This will be done by prepending a fake_venv to sys.path, and for that we need to know the site-packages directory created by the fake_venv.
LocalPackageResolver without a given pyenv_path defaults to looking up packages in sys.path. This is problematic in tests, as we do not fully control the Python environment in which the tests are run (e.g. consider a developer running pytest directly in their dev environment where they happen to have installed some packages that we don't expect). Until now, we've skirted around the problem by adding a couple of TODO comments near the relevant tests, but it's now time to solve this in a better way. However, short of running the tests in a subprocess, we cannot _fully_ control the sys.path. What we can do, is to use a combination of our fake_venv fixture and pytest's monkeypatch.syspath_prepend() function to insert an environment that we _do_ control at the start of sys.path. This will allow us to place entries into that fake_venv that we can then expect to be resolved exactly as we expect in tests. We introduce a new isolate_default_resolver fixture to apply this partial isolation for a given dict of package-to-imports mappings. At the same time we add a default_sys_path_env_for_tests dict that encodes what we previously considered the minimal/default package mappings that we expected (but could not guarantee) to be present in the test environment. With this, we can now call isolate_default_resolver(default_sys_path_env_for_tests) from tests that previously relied on the surrounding test environment, and their expectation is now guarateed by this new fake_venv inserted at the start of sys.path. Note that for tests that rely on packages NOT being installed into the test environment (e.g. testing the fallback to the identity mapping), we must still take care to use package names that we can ~guarantee never to be present in the environment in which the tests are run. Thus, we are only _partially_ isolated from the surrounding test environment.
When LocalPackageResolver is used to look up packages in the _current_ Python environment, it should take care to use the same package that Python itself would end up using when importing a package of that name. Until now, the way we have used importlib_metadata (specifically the way we iterate over MetadataPathFinder().find_distributions()), if a packages occurs multiple times in sys.path, we would end up using the _last_ instance, whereas Python would use the _first_. Fix our iteration to be in line with Python.
Until now we have designed the Package object to be able to accept package-name-to-import-names mappings from _multiple_ resolvers (signified with the DependenciesMapping enum). However, our current design relies on each Package object being built by a _single_ resolver, so the multiple mappings are never used in practice. What would be nice, however, is to be able to attach some more debug information to each Package, specifically pointing to where we found the relevant package-name-to-import-names mappings represented in this object. (This will soon become more important when we add support for multiple --pyenv options.) In this refactoring we reshape the Package object as follows: - We remove the DependenciesMapping enum. It has served as a simple proxy for the resolver class/type used to create the Package object. - Instead, we record the responsible class itself directly into the Package object, by adding a .resolved_with member of type Type[BasePackageResolver]. When serializing Package objects to JSON, this member is serialized to a simple string: the class name. - We remove the .mappings member (which only ever has a single member in practice) - Instead, we add a .debug_info member which each resolver is free to populate with any information (but defaults to None). The only real requirement is that the information be JSON-serializable. For now, we use these kinds of debug information: - UserDefinedMapping adds nothing (None) for now, but this will soon change to indicate from where the relevant user-defined mapping was read - LocalPackageResolver stores a dictionary mapping the site-packages path where a Package was found to the import names found in that install location. This will soon become even more useful when we add support for multiple --pyenv arguments, which will potentially merge Package definitions from multiple site-packages directories. - TemporaryPipInstallResolver overrides the debug_info from the underlying LocalPackageResolver to a simple string indicating that packages were temporarily installed with `pip install`. - IdentityMapping uses None as it has no extra information to give. Most of the diff is occupied by associated test updates to deal with the reshaped Package objects. Except for a very few unit tests, we don't bother verifying the exact contents of the .debug_info member, instead we liberally use a ignore_package_debug_info() helper to remove .debug_info details before comparing Package objects.
All Package objects are now constructed in their fully developed form and can be treated as frozen/immutable.
When accumulating a user-defined Package object from multiple files (e.g. a [tool.fawltydeps.custom_mapping] section in pyproject.toml together with a --custom-mapping-file=mapping.toml), include the ultimate source of the mapping into the .debug_info member of the Package object. This makes it easier to debug the source of these user-defined meppings. In other words, given a mapping.toml with: apache-airflow = ["airflow"] and a pyproject.toml with: [tool.fawltydeps.custom_mapping] apache-airflow = ["foo", "bar"] we used to generate this Package object: Package( package_name="apache_airflow", import_names={"airflow", "foo", "bar"}, resolved_with=UserDefinedMapping, debug_info=None, ) but with this commit we instead get: Package( package_name="apache_airflow", import_names={"airflow", "foo", "bar"}, resolved_with=UserDefinedMapping, debug_info={ "mapping.toml": {"airflow}, "from settings": {"foo", "bar"}, }, ) Also refactor the accumulate_mappings() helper out of the UserDefinedMapping class as we will soon start to use it from LocalPackageResolver as well.
517775d
to
a10e43a
Compare
This was too long in the making, but I kept bumping into corners that delayed progress. (For example: I have bumped the integration between
--pyenv
andtraverse_project
to a later PR.)Best reviewed one commit at a time. The first three commits are pure preparation for later changes, the next four commits are related fixes that I found had to be made before tackling the main topic. The followiing two commits have the actual changes, and the last commit fixes another (unrelated) issue that was found while adding a new tests for multiple pyenvs.
Although I haven't looked at this yet, I suspect there will be conflicts with Nour's ongoing PRs (#296, #297, #298).
I'll be happy to rebase on top of those, and resolve the conflicts, but I wanted to get this PR up for review without further delay.Edit: Rebased now on top oforigin/main
after #296 was merged. Conflicts were fortunately smaller than I had feared. 🎉Commits:
noxfile
: Silence unhelpful pylint warning in testspackages.LocalPackageResolver.packages
: Use@calculated_once
decoratorconftest
: Returnsite_dir
in addition tovenv_dir
fromfake_venv()
conftest
: Partially isolate the defaultLocalPackageResolver
packages
: ReplaceDependenciesMapping
with string descriptionsLocalPackageResolver
: Resolve packages using first match insys.path
package.UserDefinedMapping
: Include source of mapping inPackage
objectsLocalPackageResolver
: Accept multiple pyenvs--pyenv
optionssettings.print_toml_config()
: Represent sets as sorted listsStill TODO in future PRs:
traverse_project
, so that we can auto-discover pyenvs underneathbasepath
s.--pyenv=.venv
could result in multiple actual environments (.venv/lib/python3.7/site-packages
AND.venv/lib/python3.8/site-packages
, etc.); so far we only grab the first one. (Fixed inLocalPackageResolver
: Handle multiple package dirs inside one Python env #318)