-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mapping resolution strategy: discussion and implementation. #256
Comments
What we also discussed is the cache should have a common format and |
CachesIf this is not complex enough, I'd like to also note that we can have two different types of caches: per-project and system-wide. The latter would include cached mappings from various previous runs of fawltydeps (not necessarily related to the project currently checked). Another consideration is that this central cache can be populated not only from PyPI installations but also from FawtyDeps looking at various local Python envs in previous runs. That may create privacy concerns so we might want to have this cache be user-based rather than system-wide? Order of mappingsMy understanding of the diagram above is that it dictates the default order the resolver will take through mappings. We have already established that some of those can be turned off (i.e. skipped):
Furthermore, I would expect a user that passed the user-provided mapping or a While this makes for a more powerful default approach, I would still like to:
The point is that, if I provided a Python environment via |
Format of cacheCurrently our "resolved_deps": {
"importlib_metadata": {
"package_name": "importlib-metadata",
"mappings": {"local_env": ["importlib_metadata"]}
},
"isort": {
"package_name": "isort",
"mappings": {"local_env": ["isort"]}
},
"pip-requirements-parser": {
"package_name": "pip-requirements-parser",
"mappings": {"local_env": ["packaging_legacy_version", "pip_requirements_parser"]}
},
... This format is sufficient to decode back into Format of user-defined mappingThe above JSON format is too verbose for a user to provide a custom mapping. We could go with a simpler, more dict-like JSON representation: {
"importlib-metadata": ["importlib_metadata"],
"isort": ["isort"],
"pip-requirements-parser": ["packaging_legacy_version", "pip_requirements_parser"]
}, Still, I don't think JSON is the best format for a user-defined mapping. Instead, I think it makes most sense to put the user-defined mapping into [tool.fawltydeps.custom_mapping]
importlib-metadata = ["importlib_metadata"]
isort = ["isort"]
pip-requirements-parser = ["packaging_legacy_version", "pip_requirements_parser"] Caches
Yes! A "per-user" cache stored under e.g. As for "per-user" vs "per-project" I'm undecided, but I think it's better to start small and grow its use once we (a) see that it works well without major downsides, and (b) we identify more areas where caching bring a demonstrable benefit.
My opinion here is the same as above: Let's take one step at a time. Although the cache might end up encompassing "everything" in the end, I'd like to start with caching the PyPI parts where we already know that building a mapping can be very expensive. As for local envs, I'd like to first establish how slow the local mapping really is. A small test on my laptop in my FD dev env with 43 packages installed shows that Order and customization of mappings
Yes, default order, and the ones that are disabled (either by default, by omission, or by explicit option) are simply skipped/removed from the diagram.
I'm not so sure: I don't this it's very user-friendly to tell the user: "If you want to provide your own mapping, you better remember to include everything in it...". For
Correct. That is my current POV when it comes to our default mapping.
Agreed. As this mapping can become quite complex, we do need to "tag" our results with which mappings were involved. AFAICS,
I agree that this should be possible, but I'm not sure it makes sense as the default behavior when you pass Here is an idea that has presented itself while thinking about this:
Footnotes
|
This does not change current behavior, but does a preliminary refactoring on the way towards having multiple package-to-import resolvers as discussed in #256 and associated issues. Introduce a BasePackageResolver abstract base class to define the interface for package resolvers, and provide, for now, two subclasses: - LocalPackageResolver (renamed from LocalPackageLookup) - IdentityMapping (refactored from resolve_dependencies()) Also refactor resolved_dependencies() to operate on a list of BasePackageResolver objects: For each dependency name we go through the list of resolvers until a resolver returns a Package object for this name. We then add this Package to the returned mapping. For now this list of resolvers is hardcoded to exactly two entries, the LocalPackageResolver we were already using, and the IdentityMapping that we were effectively falling back on for packages not found in the former. Ideas for future work that build on this: - Introduce more/different BasePackageResolver subclasses to do different kinds of package resolving. E.g. user-defined mapping from a config file. Resolving by looking up packages remotely on PyPI, etc. - Make the list of resolvers user-configurable instead of hardcoded. - Properly handle the case where none of the resolver find a package mapping, i.e. _unmapped_ depenedencies.
This does not change current behavior, but does a preliminary refactoring on the way towards having multiple package-to-import resolvers as discussed in #256 and associated issues. Introduce a BasePackageResolver abstract base class to define the interface for package resolvers, and provide, for now, two subclasses: - LocalPackageResolver (renamed from LocalPackageLookup) - IdentityMapping (refactored from resolve_dependencies()) Also refactor resolved_dependencies() to operate on a list of BasePackageResolver objects: For each dependency name we go through the list of resolvers until a resolver returns a Package object for this name. We then add this Package to the returned mapping. For now this list of resolvers is hardcoded to exactly two entries, the LocalPackageResolver we were already using, and the IdentityMapping that we were effectively falling back on for packages not found in the former. Ideas for future work that build on this: - Introduce more/different BasePackageResolver subclasses to do different kinds of package resolving. E.g. user-defined mapping from a config file. Resolving by looking up packages remotely on PyPI, etc. - Make the list of resolvers user-configurable instead of hardcoded. - Properly handle the case where none of the resolver find a package mapping, i.e. _unmapped_ depenedencies.
Idea: Only provide options to enable/disable resolvers, not reorder them.
|
This does not change current behavior, but does a preliminary refactoring on the way towards having multiple package-to-import resolvers as discussed in #256 and associated issues. Introduce a BasePackageResolver abstract base class to define the interface for package resolvers, and provide, for now, two subclasses: - LocalPackageResolver (renamed from LocalPackageLookup) - IdentityMapping (refactored from resolve_dependencies()) Also refactor resolved_dependencies() to operate on a list of BasePackageResolver objects: For each dependency name we go through the list of resolvers until a resolver returns a Package object for this name. We then add this Package to the returned mapping. For now this list of resolvers is hardcoded to exactly two entries, the LocalPackageResolver we were already using, and the IdentityMapping that we were effectively falling back on for packages not found in the former. Ideas for future work that build on this: - Introduce more/different BasePackageResolver subclasses to do different kinds of package resolving. E.g. user-defined mapping from a config file. Resolving by looking up packages remotely on PyPI, etc. - Make the list of resolvers user-configurable instead of hardcoded. - Properly handle the case where none of the resolver find a package mapping, i.e. _unmapped_ depenedencies.
Visualization to the team meeting discussion with @Nour-Mws and @jherland.
About 1) Each of the four listed above should have a way to exclusively switch on and off. This option should be considered a power-user option and thus not easily available, for example binary flags:
As for what we called "presets" these are combinations of the resolvers above.
I think this is quite elastic setup, and will not require more options for now. Unless you think that some of the combinations I marked as X is interesting. |
Thank you for creating this overview @mknorps. It really helps organize the thought process around these options. I agree with mostly everything you're writing, and I only comment on the things that I feel need more nuance or clarification:
I agree that these options should be considered for power users only, and they are meant for when the presets are not sufficient to configure the desired resolver stack. As you say, they are binary/boolean flags, but I think it's worth pointing out that they also have an unspecified state (the default state), where we defer to the underlying presets (or default preset) to determine whether they are enabled/disabled. Do you think these option should take a value, like On to the presets and how they map onto enabling/disabling the 4 resolvers: About the user-provided mappingThere is one issue I want to raise with your DEFAULT preset above: When a user does not provide any options, but they do have a Therefore I would consider that the user-provided mapping should actually be ON by default. (When there are no actual values in the mapping, it does behave the same as if it was disabled). When it's on by default, the In other words, I'd like to relabel the DEFAULT column above to be an R column, and move the DEFAULT label to the third-last column ( (We might want to issue a warning if Otherwise, I think the user-provided mapping should be independent of the other resolvers. We discussed briefly in the meeting whether a user-provided mapping should disable the identity mapping (e.g. when a user wants full control via their own mapping), but I think we can consider this a power user scenario for now (solved it with About the install-deps mappingI think we all agree that the this preset ( SummaryIf I interpret this correctly, it seems we really only have one preset option for now: Forgive me for inverting your rows/columns:
|
Thanks a lot @mknorps for taking the time to do this useful analysis. I disagree in general on the need to provide a boolean I must admit I don't see the benefit of specifying an additional boolean switch on top of this. The only case I can imagine where it might be potentially useful is when you have defined some user-defined mapping(s) but want to suppress them. And in that case, I would argue it's cleaner to remove/unset these settings rather than turning them off with a switch. |
For the list of boolean switches:
If we remove
Local packages mappingFor the
|
Whether or not we try to minimize the amount of nonsense combination of command-line options, we should at least try to warn the user when these nonsense combinations occur in practice.
Although I expect this to be true most of the time, we do need to consider the case where a dependency fails to be installed with |
Summary of discussion in today's status meeting: At this point in time, we need:
|
Closing this now, as we're ready to release v0.13 with the new mapping strategy implemented. |
Execution plan:
--install-deps
) #266 (do after 273, 249 and 260 are done)install_deps
oridentity_mapping
#299Extra (related to the mapping strategy, but we decided to exclude them from the milestone):
The question is how to combine different ways of mapping dependencies to imports:
basepath
,sys.path
)--pyenv
optiontoplevel.txt
or other (to be investigated)This sparked from various discussions and conversation in a #252.
3 rounds of visualization of the flow of this translation funnel and we have following diagram, by @jherland

We should also decide what customization options should be given to a user:
sys.path
,basepath
The text was updated successfully, but these errors were encountered: