-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[prerelease] Performance regression? #9478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
There is. I can rebuild all the succeeded packages with pytest 6 and then we can parse the time information for pytest from the logs. We also have the overall build time, but that is unfortunately influenced by unstable external factors. |
If you could do that, that'd be amazing! Would be great to see how big the impact is, and also perhaps find a project to reproduce the issue with. Let me know if there is something I can help with. |
For pytest 7:
...running... |
In the meantime, the pytest 6 builds run in https://copr.fedorainfracloud.org/coprs/churchyard/pytest-6/builds/ I the times will indeed be off, I can resubmit the pytest-7 builds once more to eliminate a difference in e.g. upgraded hardware or whatever, the pytest-7 builds are quite old now. |
pytest 7
The entire log: pytest-7.txt In case you want to parse that, note that some packages have no info (pytest was not actually invoked during the build or the output is unusual) and some have multiple infos (pytest was invoked multiple times during one build). |
The Fedora tests are with Python 3.10.1. |
Great, thank you! Parsed with: import pathlib
import re
from rich import print # optional
txt = pathlib.Path("pytest-7.txt").read_text()
data = [e.strip().splitlines() for e in txt.strip().split("\n\n")]
total_times = {}
for name, *rest in data:
if not rest:
continue
# print(name)
times = []
for line in rest:
assert line.startswith("=") and line.endswith("=")
m = re.match(r".* in ([\d.]*)s ", line)
times.append(float(m.group(1)))
total_times[name] = sum(times)
print(total_times) so we should be able to compare things once you have the same logs for pytest 6 ready. Thanks again! FWIW I'd hope that the Python version doesn't make a difference, given that the reporter had the slowdown with two different pytest versions but the same Python version (I assume). |
I've compared the two profile files - the information in them seem to be a little tricky, but here is what I learned:
Unfortunately I can't seem to get the callees of that from the profile... but one offender I found by clicking around seems to be pytest 7: Similarly, during collection, with pytest 6: pytest 7: A different look with another tool ( pytest 7: Take it with a grain of salt as interpreting Python profiles seems to differ a bit depending on the tools usually, but that gives us a rough idea where to look I guess. @bluetech do you have any idea how this could happen? I'm also wondering if there is simply something different about their environment, or they perhaps have a virtualenv or somesuch which isn't in |
the way the code repeatedly checks for dirs/files and subdirs seems confusingly expensive, it may be sensible to put the conftests/ancors into the collection tree (and link the conftests to pakcages/directories) |
Note that the pytest 6 file may not contain the exact same set of packages, so don't count a naïve total sum: |
Some more detailed analysis based on those two logs: import pathlib
import dataclasses
import operator
import re
import rich
import rich.table
@dataclasses.dataclass
class Entry:
name: str
time6: float
time7: float
dabs: float = dataclasses.field(init=False)
drel: float = dataclasses.field(init=False)
def __post_init__(self):
self.dabs = self.time7 - self.time6
self.drel = self.dabs / self.time6 * 100
def parse_log(path: pathlib.Path):
txt = path.read_text()
data = [e.strip().splitlines() for e in txt.strip().split("\n\n")]
total_times = {}
for name, *rest in data:
if not rest:
continue
# print(name)
times = []
for line in rest:
assert line.startswith("=") and line.endswith("=")
m = re.match(r".* in ([\d.]*)s ", line)
times.append(float(m.group(1)))
total_times[name] = sum(times)
return total_times
def gen_entries(times6, times7):
for name in times6.keys() | times7.keys():
if times6.get(name) == 0:
continue
try:
yield Entry(name=name, time6=times6[name], time7=times7[name])
except KeyError:
pass
def avg(elems):
lst = list(elems)
return sum(lst) / len(lst)
def main():
times6 = parse_log(pathlib.Path("pytest-6.txt"))
times7 = parse_log(pathlib.Path("pytest-7.txt"))
entries = sorted(gen_entries(times6, times7), key=operator.attrgetter("drel"))
table = rich.table.Table()
delta = "\N{Greek Capital Letter Delta}"
table.add_column("Name")
table.add_column("pytest 6", justify="right")
table.add_column("pytest 7", justify="right")
table.add_column(f"{delta}abs", justify="right")
table.add_column(f"{delta}rel", justify="right")
for e in entries:
table.add_row(e.name, f"{e.time6:.02f}", f"{e.time7:.02f}", f"{e.dabs:.02f}", f"{e.drel:.02f}%")
sum6 = sum(e.time6 for e in entries)
sum7 = sum(e.time7 for e in entries)
sumabs = sum(e.dabs for e in entries)
sumrel = sumabs / sum6 * 100
table.add_row("sum", f"{sum6:.02f}", f"{sum7:.02f}", f"{sumabs:.02f}", f"{sumrel:.02f}%", style="blue")
avg6 = avg(e.time6 for e in entries)
avg7 = avg(e.time7 for e in entries)
avgabs = avg(e.dabs for e in entries)
avgrel = avg(e.drel for e in entries)
table.add_row("avg", f"{avg6:.02f}", f"{avg7:.02f}", f"{avgabs:.02f}", f"{avgrel:.02f}%", style="blue")
rich.print(table)
if __name__ == "__main__":
main() results in:
So it does look like the build in total (only considering packages in both sets) took some 2h 45min longer, or around 30%... Of course there are a lot of factors involved, but it does seem like things got slower overall, perhaps enough that we should take a closer look? Full output:
|
Should I restart the 7 builds to make it more up to date? |
If you don't mind the additional load this puts on Fedora infra, that would be great - I guess it would certainly make the values more accurate. |
Updated pytest 7 times: pytest-7.txt |
Well, this is looking very different indeed:
(note that "100%" still means "double the original time" though...) Anyone volunteering to look into some of those projects in more detail to figure out if the same tests did run, and if so, if there really is some kind of slowdown in pytest with them? Full output:
|
I'll definitely look at some of them. I'm going to bet that the slowness comes from the extra stats due to the pathlib migration, possibly with the addition of a slow filesystem setup on the Fedora/user's system (maybe running NFS or such?). But I'm not making assumptions :) |
No NFS, the filesystem is local. (I'll double check.) |
Great, thanks @bluetech! I already asked the reporter and they said:
So nothing too special, at least from what it sounds like there... |
I can reproduce the slowdown pretty easily using Line 539 in 71baf24
I've attached a profile generated using these tools this is using ~stock python 3.8.10 on ubuntu 20.04 |
this patch "fixes" the performance issue, though I haven't studied the code close enough to see whether the optimization is correct: $ diff -u old.py ../venv-pytest7/lib/python3.8/site-packages/_pytest/config/__init__.py
--- old.py 2022-01-06 19:47:05.974081512 -0500
+++ ../venv-pytest7/lib/python3.8/site-packages/_pytest/config/__init__.py 2022-01-06 19:45:09.238082625 -0500
@@ -524,6 +525,7 @@
if x.is_dir():
self._getconftestmodules(x, importmode, rootpath)
+ @lru_cache(maxsize=1)
def _getconftestmodules(
self, path: Path, importmode: Union[str, ImportMode], rootpath: Path
) -> List[types.ModuleType]: |
heh, bisecting gives a commit that definitely makes sense 614f539: 614f5394b50ac88b7aa4e97b43acb08aca383d95 is the first bad commit
commit 614f5394b50ac88b7aa4e97b43acb08aca383d95
Author: Ran Benita <[email protected]>
Date: Fri Oct 1 14:28:16 2021 +0300
Avoid `@lru_cache` on methods
The problem with `@lru_cache` on methods is that it also captures `self`
and leaks it until the entry is evicted (if ever).
src/_pytest/assertion/rewrite.py | 7 ++-----
src/_pytest/config/__init__.py | 1 -
2 files changed, 2 insertions(+), 6 deletions(-)
bisect run success I bisected using this script: #!/usr/bin/env bash
set -euxo pipefail
pip uninstall -yqq pytest >& /dev/null
pip install -qq . >& /dev/null
cd ../python-node-semver
git clean -fxfdq
timeout 2.5 pytest nodesemver || exit 1 and this technique
|
the story doesn't quite end there though -- the second commit in that patch is supposed to patch up that performance hit but makes a tiny edge-case mistake on the caching code -- it ignores the case when the cached value is falsey fixing that gets us most of the way back to the pytest 6.x performance but not quite because the beginning of diff --git a/src/_pytest/config/__init__.py b/src/_pytest/config/__init__.py
index dee3d4974..174d80a01 100644
--- a/src/_pytest/config/__init__.py
+++ b/src/_pytest/config/__init__.py
@@ -538,7 +538,7 @@ class PytestPluginManager(PluginManager):
# Optimization: avoid repeated searches in the same directory.
# Assumes always called with same importmode and rootpath.
existing_clist = self._dirpath2confmods.get(directory)
- if existing_clist:
+ if existing_clist is not None:
return existing_clist
# XXX these days we may rather want to use config.rootpath |
an idea to get back to the original caching but without a memory leak would be to assign a per-instance lru_cache: diff --git a/src/_pytest/config/__init__.py b/src/_pytest/config/__init__.py
index dee3d4974..15a9c0ef1 100644
--- a/src/_pytest/config/__init__.py
+++ b/src/_pytest/config/__init__.py
@@ -354,6 +354,7 @@ class PytestPluginManager(PluginManager):
self._confcutdir: Optional[Path] = None
self._noconftest = False
self._duplicatepaths: Set[Path] = set()
+ self._getconftestmodules = lru_cache(maxsize=128)(self._getconftestmodules_uncached)
# plugins that were explicitly skipped with pytest.skip
# list of (module name, skip reason)
@@ -524,7 +525,7 @@ class PytestPluginManager(PluginManager):
if x.is_dir():
self._getconftestmodules(x, importmode, rootpath)
- def _getconftestmodules(
+ def _getconftestmodules_uncached(
self, path: Path, importmode: Union[str, ImportMode], rootpath: Path
) -> List[types.ModuleType]:
if self._noconftest: I used the same technique in my text editor though admittedly for a different reason |
This is interesting, but doesn't that still creates a circular reference, because Perhaps it is simple enough to implement an infinite cache using a dict? Caching all calls seems feasible in this case. |
Great stuff @asottile, especially the diagram. I am able to reproduce your findings. First, I am going to submit your
Regarding adding back the lru_cache to In the meantime, IMO the |
I agree that should be sufficient. I feel like that gets things to the same ballpark again, so that we can have further performance fixes in 7.0.1 or 7.1.0. |
This made the cache not work as intended, causing a major slowdown. See pytest-dev#9478 for discussion and context. Authored-by: Anthony Sottile <[email protected]>
Oh, also, how do folks feel about a quick 7.0.0rc2 now? We still need to take care of #9002 and #9471 (plus potentially "unknown unknowns" from #9415?), but that would mean it's easy for people (and probably Fedora?) to test the fixes which we have so far (and it makes it easier to ask the original reporter to test this fix too). |
I might be able to test that in Fedroa with the git HEAD as well even without rc. |
the memory issue isn't due to a cycle but due to |
FWIW the reporter said:
(edited to add the correction from the follow-up tweet) so looks like all in all, 7.0.0 was a great performance improvement - thanks to everyone involved! |
According to nikolaizujev on Twitter:
Unfortunately no reproducer, but I got
.prof
dumps from them: Dropbox - pytest - Simplify your lifeOpening this to remind myself to have a look at those before the final release - but ideally we'd have a proper reproducer so we can e.g. bisect this. Does anyone have any guess what could be the culprit? I only see a very small slowdown for my own project.
@hroncok Is there perhaps a way to find out which packages in Fedora took significantly longer to run tests with pytest 7 compared to 6? This would mainly be for runs which still pass, because if there are failures, all bets regarding the runtime are off anyways.
The text was updated successfully, but these errors were encountered: