Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record installed files in a deterministic order #4667

Merged
merged 1 commit into from
Oct 26, 2017
Merged

Record installed files in a deterministic order #4667

merged 1 commit into from
Oct 26, 2017

Conversation

bochecha
Copy link
Contributor

Installed files are recorded by Pip in the order the underlying tool
(Distutils, Setuptools, ...) recorded them.

Unfortunately, at least Setuptools doesn't record them in a
deterministic order in the case of a directory being installed, as it
uses os.walk to find the list of files.

We could fix all those underlying tools to record their files in a
deterministic order in all situations. But fixing it once here in Pip
for all tools is certainly simpler and more future-proof.

This makes the installation more reproducible, and therefore more
verifiable.


Note: I did not add a NEWS file fragment because I'm not sure this deserves being mentioned. If you think it does, I'll be happy to add one. 😃

@pradyunsg
Copy link
Member

I'm not sure this deserves being mentioned.

While I don't know if this functionality is something we want to add to pip (I'm not the right person to comment on that), it does seem to me that adding a news fragment would be needed.

@pradyunsg pradyunsg added type: enhancement Improvements to functionality state: needs discussion This needs some more discussion labels Aug 12, 2017
@bochecha
Copy link
Contributor Author

@pradyunsg Thanks, I've added a news fragment then.

I picked the bugfix type because unreproducible builds/installs feel like a bug to me. I'm happy to change it to whatever people here think would be more appropriate. (maybe feature or trivial?)

@pradyunsg
Copy link
Member

maybe feature

I'd say so but I won't push on that -- I'd wait for a core committer to comment on this. :)

@BrownTruck
Copy link
Contributor

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will eligible for code review and hopefully merging!

@BrownTruck BrownTruck added the needs rebase or merge PR has conflicts with current master label Sep 1, 2017
@pypa-bot pypa-bot removed the needs rebase or merge PR has conflicts with current master label Sep 2, 2017
@xavfernandez
Copy link
Member

This would also need a test to make sure this does not get dropped at the next refactor.

@bochecha
Copy link
Contributor Author

This would also need a test to make sure this does not get dropped at the next refactor.

I'd be happy to add a test for this, but I'm not sure where would be the most appropriate. tests/functional/test_install.py maybe?

@pradyunsg
Copy link
Member

Yep! It'd be nice if you add a doc-string to it. :)

@bochecha
Copy link
Contributor Author

Test added in the latest push to this branch. Let me know if there's anything else I can do to get this merged. 🙂

Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. :)

news/4667.bugfix Outdated
@@ -0,0 +1,2 @@
Made the order of recorded installed files deterministic, to make installations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd phrase it as:

pip now records installed files in a deterministic manner improving reproducibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the latest push to this branch.

@pradyunsg
Copy link
Member

I'll wait for another review from @pypa/pip-committers before merging. :)

@@ -802,6 +802,7 @@ def prepend_root(path):
new_lines.append(
os.path.relpath(prepend_root(filename), egg_info_dir)
)
new_lines = sorted(new_lines)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to make this an actual list (new_lines.sort()) rather than a generator? I guess the generator works at the moment, but it may cause confusion for future maintenance that new_lines isn't an actual list after this point.

Copy link
Member

@xavfernandez xavfernandez Oct 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorted returns a list https://docs.python.org/3.6/library/functions.html#sorted but indeed calling new_lines.sort() should be enough since it is already a list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch. Sorry, I was getting confused between sorted and reversed... Thanks for the correction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, doesn't make a difference to me... Latest version of this branch uses new_lines.sort().

Copy link
Member

@xavfernandez xavfernandez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming the different installed_files variables in the test would improve its readability :o

result = script.pip('install', to_install, expect_error=False)
fspkg_folder = script.site_packages / 'fspkg'
egg_info = 'FSPkg-0.1.dev0-py%s.egg-info' % pyversion
installed_files = script.site_packages / egg_info / 'installed-files.txt'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: installed_files_path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

assert installed_files in result.files_created, str(result)

installed_files = result.files_created[installed_files].full
installed_files = [p for p in installed_files.read_text().split('\n') if p]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: installed_files_lines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@pradyunsg pradyunsg closed this Oct 24, 2017
@pradyunsg pradyunsg reopened this Oct 24, 2017
@pradyunsg
Copy link
Member

Retriggered CI and status check.

@pradyunsg
Copy link
Member

Appveyor does not approve.

@pfmoore
Copy link
Member

pfmoore commented Oct 24, 2017

That looks like a genuine issue. How can Travis be OK with it, but Appveyor not? It doesn't seem to be a Windows-specific issue. Is that test run on both Windows and Unix?

Weird...

Copy link
Member

@xavfernandez xavfernandez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like installed_files_path is a pathlib.Path object in Unix but a simple str path in Windows ?

@pfmoore
Copy link
Member

pfmoore commented Oct 24, 2017

I'm guessing the problem comes from tests/lib/__init__.py in TestPipResult:

    if sys.platform == 'win32':

        @property
        def stdout(self):
            return self._impl.stdout.replace('\r\n', '\n')

        @property
        def stderr(self):
            return self._impl.stderr.replace('\r\n', '\n')

If _impl.std{out,err} is an object with a __str__ method, this will give a string on Windows, while leaving the rich object exposed on Unix :-( As far as I can tell, _impl is a scripttest.TestFileEnvironment object, so that's likely what's gone on here. I haven't debugged further into scripttest.

@pfmoore
Copy link
Member

pfmoore commented Oct 24, 2017

In case it's not clear, this is a bug in the pip test harness on Windows, rather than a bug in this PR. Either way, it'll need fixing, though.

@bochecha
Copy link
Contributor Author

I don't have access to a Windows machine, which will make it a bit hard to fix this blindly.

So I guess I'll have to wait for someone else to fix it, before we can merge this?

@pradyunsg
Copy link
Member

So I guess I'll have to wait for someone else to fix it, before we can merge this?

Yeah... I'll look into this as well over the weekend.

@benoit-pierre
Copy link
Member

This is the problem:

>>> import posixpath
>>> import ntpath
>>> from tests.lib.path import Path
>>> base_path = Path(r'T:\pytest-0\test_installed_files_recorded_0\workspace')
>>> path = r'venv\Lib\site-packages\FSPkg-0.1.dev0-py3.6.egg-info\installed-files.txt'
>>> posixpath.join(Path(base_path.replace('\\', '/')), path.replace('\\', '/'))
Path('T:/pytest-0/test_installed_files_recorded_0/workspace/venv/Lib/site-packages/FSPkg-0.1.dev0-py3.6.egg-info/installed-files.txt')
>>> ntpath.join(base_path, path)
'T:\\pytest-0\\test_installed_files_recorded_0\\workspace\\venv\\Lib\\site-packages\\FSPkg-0.1.dev0-py3.6.egg-info\\installed-files.txt'

So the result of os.path.join(Path(...), str(...)) is not a Path object on Windows...

@bochecha: can you change the test code to:

    installed_files_path = result.files_created[installed_files_path].full
    installed_files_lines = [
        p for p in Path(installed_files_path).read_text().split('\n') if p
    ] 

?

@pfmoore
Copy link
Member

pfmoore commented Oct 26, 2017

@benoit-pierre Good catch. I'm actually more surprised that os.path.join does return a Path object on POSIX. I'd assume that's an accident - AIUI, Path objects are subclasses of str so I guess if posixpath.join uses the passed in path unchanged, but ntpath copies it (likely to do something like backslash replacement) this is the result.

Your suggested fix looks correct to me.

Installed files are recorded by Pip in the order the underlying tool
(Distutils, Setuptools, ...) recorded them.

Unfortunately, at least Setuptools doesn't record them in a
deterministic order in the case of a directory being installed, as it
uses os.walk to find the list of files.

We could fix all those underlying tools to record their files in a
deterministic order in all situations. But fixing it once here in Pip
for all tools is certainly simpler and more future-proof.

This makes the installation more reproducible, and therefore more
verifiable.
@bochecha
Copy link
Contributor Author

Latest push should fix the tests on Windows, as suggested by @benoit-pierre. Thanks!

@pradyunsg
Copy link
Member

Thanks for looking into this @benoit-pierre! ^>^

@xavfernandez xavfernandez merged commit 62875be into pypa:master Oct 26, 2017
@xavfernandez
Copy link
Member

@bochecha thanks for the PR :)

@lock
Copy link

lock bot commented Jun 2, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot added the auto-locked Outdated issues that have been locked by automation label Jun 2, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jun 2, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation state: needs discussion This needs some more discussion type: enhancement Improvements to functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants