-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Record installed files in a deterministic order #4667
Record installed files in a deterministic order #4667
Conversation
While I don't know if this functionality is something we want to add to pip (I'm not the right person to comment on that), it does seem to me that adding a news fragment would be needed. |
@pradyunsg Thanks, I've added a news fragment then. I picked the |
I'd say so but I won't push on that -- I'd wait for a core committer to comment on this. :) |
Hello! I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the |
This would also need a test to make sure this does not get dropped at the next refactor. |
I'd be happy to add a test for this, but I'm not sure where would be the most appropriate. |
Yep! It'd be nice if you add a doc-string to it. :) |
Test added in the latest push to this branch. Let me know if there's anything else I can do to get this merged. 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. :)
news/4667.bugfix
Outdated
@@ -0,0 +1,2 @@ | |||
Made the order of recorded installed files deterministic, to make installations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd phrase it as:
pip now records installed files in a deterministic manner improving reproducibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in the latest push to this branch.
I'll wait for another review from @pypa/pip-committers before merging. :) |
src/pip/_internal/req/req_install.py
Outdated
@@ -802,6 +802,7 @@ def prepend_root(path): | |||
new_lines.append( | |||
os.path.relpath(prepend_root(filename), egg_info_dir) | |||
) | |||
new_lines = sorted(new_lines) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to make this an actual list (new_lines.sort()
) rather than a generator? I guess the generator works at the moment, but it may cause confusion for future maintenance that new_lines
isn't an actual list after this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorted
returns a list https://docs.python.org/3.6/library/functions.html#sorted but indeed calling new_lines.sort()
should be enough since it is already a list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch. Sorry, I was getting confused between sorted
and reversed
... Thanks for the correction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, doesn't make a difference to me... Latest version of this branch uses new_lines.sort()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renaming the different installed_files
variables in the test would improve its readability :o
tests/functional/test_install.py
Outdated
result = script.pip('install', to_install, expect_error=False) | ||
fspkg_folder = script.site_packages / 'fspkg' | ||
egg_info = 'FSPkg-0.1.dev0-py%s.egg-info' % pyversion | ||
installed_files = script.site_packages / egg_info / 'installed-files.txt' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: installed_files_path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
tests/functional/test_install.py
Outdated
assert installed_files in result.files_created, str(result) | ||
|
||
installed_files = result.files_created[installed_files].full | ||
installed_files = [p for p in installed_files.read_text().split('\n') if p] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: installed_files_lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Retriggered CI and status check. |
Appveyor does not approve. |
That looks like a genuine issue. How can Travis be OK with it, but Appveyor not? It doesn't seem to be a Windows-specific issue. Is that test run on both Windows and Unix? Weird... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like installed_files_path
is a pathlib.Path object in Unix but a simple str path in Windows ?
I'm guessing the problem comes from if sys.platform == 'win32':
@property
def stdout(self):
return self._impl.stdout.replace('\r\n', '\n')
@property
def stderr(self):
return self._impl.stderr.replace('\r\n', '\n') If |
In case it's not clear, this is a bug in the pip test harness on Windows, rather than a bug in this PR. Either way, it'll need fixing, though. |
I don't have access to a Windows machine, which will make it a bit hard to fix this blindly. So I guess I'll have to wait for someone else to fix it, before we can merge this? |
Yeah... I'll look into this as well over the weekend. |
This is the problem: >>> import posixpath
>>> import ntpath
>>> from tests.lib.path import Path
>>> base_path = Path(r'T:\pytest-0\test_installed_files_recorded_0\workspace')
>>> path = r'venv\Lib\site-packages\FSPkg-0.1.dev0-py3.6.egg-info\installed-files.txt'
>>> posixpath.join(Path(base_path.replace('\\', '/')), path.replace('\\', '/'))
Path('T:/pytest-0/test_installed_files_recorded_0/workspace/venv/Lib/site-packages/FSPkg-0.1.dev0-py3.6.egg-info/installed-files.txt')
>>> ntpath.join(base_path, path)
'T:\\pytest-0\\test_installed_files_recorded_0\\workspace\\venv\\Lib\\site-packages\\FSPkg-0.1.dev0-py3.6.egg-info\\installed-files.txt' So the result of @bochecha: can you change the test code to: installed_files_path = result.files_created[installed_files_path].full
installed_files_lines = [
p for p in Path(installed_files_path).read_text().split('\n') if p
] ? |
@benoit-pierre Good catch. I'm actually more surprised that Your suggested fix looks correct to me. |
Installed files are recorded by Pip in the order the underlying tool (Distutils, Setuptools, ...) recorded them. Unfortunately, at least Setuptools doesn't record them in a deterministic order in the case of a directory being installed, as it uses os.walk to find the list of files. We could fix all those underlying tools to record their files in a deterministic order in all situations. But fixing it once here in Pip for all tools is certainly simpler and more future-proof. This makes the installation more reproducible, and therefore more verifiable.
Latest push should fix the tests on Windows, as suggested by @benoit-pierre. Thanks! |
Thanks for looking into this @benoit-pierre! ^>^ |
@bochecha thanks for the PR :) |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Installed files are recorded by Pip in the order the underlying tool
(Distutils, Setuptools, ...) recorded them.
Unfortunately, at least Setuptools doesn't record them in a
deterministic order in the case of a directory being installed, as it
uses os.walk to find the list of files.
We could fix all those underlying tools to record their files in a
deterministic order in all situations. But fixing it once here in Pip
for all tools is certainly simpler and more future-proof.
This makes the installation more reproducible, and therefore more
verifiable.
Note: I did not add a NEWS file fragment because I'm not sure this deserves being mentioned. If you think it does, I'll be happy to add one. 😃