-
-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode incorrectly escaped on PyPy 7.3.1 #176
Comments
Going to need some help on this one, interactions of C and PyPy are beyond me. @mattip is there a PyPy person who might be able to spot something? |
I thought the C-extension is not used on PyPy? I will take a look. |
Wheels can now be built against PyPy, I actually tested it a while ago and only just pushed them since I was doing the other new builds. I guess I'll have to delete the 1.1.1 wheels, I can't replace them even if we fix the issue, I'd have to make a 1.1.2 release. I probably won't do that, I'll just leave PyPy wheels disabled for 2.0 until we figure it out. |
That said, @ShadowJonathan can you test with a build from master? 2.0 will remove Python 2 support, which removed about half the C code, so it should be much easier to reason about. Just want to make sure it's not already fixed somehow. |
I recommend just disabling building extensions on PyPy, instead of disabling wheels, as installing wheels is much faster than building from source (especially if that source tries to build extensions)
Currently trying that, but keep in mind that
@mattip there's a |
Inserting |
weird, when I do |
if os.environ.get("CIBUILDWHEEL", "0") == "1":
run_setup(True) which gives cibuildwheel a carta blanca to build any kind of wheel it wants, so the released wheels contain |
You have to set |
@davidism i just did that, while installing from master, but the issue persists. I think disabling building extensions on pypy at all would help a lot. |
Yeah, I'll just adjust it for now so the extension isn't built directly or with cibuildwheel on PyPy. Unfortunately I'll just have to delete the 1.1.1 wheels and people will still have to install from the sdist in that case, but that was already the case until yesterday anyway. |
Of course, the wheel building the |
Interesting, if you think PyPy wouldn't benefit from the C extension anyway then that's another reason to disable it for now. I'm also interested in, but don't know much about, the limited C API and HPy, so I'm happy to revisit it later if someone figures those out. |
HPy still hasn't released an alpha version, so it might be a while, but it is good to know there is interest. Thanks for trying to support PyPy, ping me if you need more help. I am trying to reproduce the failure from the top of this issue, will get back when I have a fix. |
OK, I deleted the 1.1.1 pp36 and pp37 wheels, PyPy users will have to install from the sdist until 2.0 (which was the case anyway). I'll disable building the extension during cibuildwheel later, leaving this issue open for that. I don't think I'll be able to mark just the PyPy wheels "pure Python", so future versions will still have platform tags even though they don't have the C extension and are equivalent. |
It turns out the synapse issue mentioned above was very likely due to using an older version of pypy3.6 (released in April 2020), the 7.3.3 version released in November 2020 and available from the downloads, conda-forge, github actions, cibuildwheel, and multibuild correctly parses the input in the issue. Could someone change the title to add "... on PyPy 7.3.1" ? |
As far as I know I just let cibuildwheel setup PyPy, and you say cibuildwheel uses the correct version, so I'm not sure what's going on. cc @joerick again :-) |
Although that might be better as an issue elsewhere, seems like for MarkupSafe the answer is to omit the C extension regardless since you said it's not actually faster on PyPy. |
Sorry, I wasn't clear. The issue here is that the reporter tried to build and test synapse locally with an older version of PyPy. |
Ah, I see, but now I'm not clear if there was ever an issue with MarkupSafe. Should I still avoid building the C extension for PyPy wheels? |
Both types of wheels should work properly for current versions of PyPy. As for the c-extension: I don't know if there are real-world benchmarks for MarkupSafe. That would be the definitive test whether (additional overhead of calling the C-API + savings by using C) beats the ability of the PyPy JIT to speed up the pure-python code. My intuition, based on the many templating benchmarks at speed.pypy.org, would say that the pure-python code should be faster, but intuition and benchmarks often clash. |
I ran a benchmark of native Python vs compiled speedups on PyPy. Seems we should stick with the native version for now, speedups were about twice as slow. Note that these numbers are for ridiculously large strings, most use cases will probably be reasonably fast no matter what Python implementation or speedups are used. Here's the results on PyPy 3.7 7.3.3:
PyPy was still much slower than CPython. Maybe this just isn't a task PyPy is suited to speeding up, although native escape is pretty much just 5
I used the following as a benchmark. Generated 10,000 lines of random UTF-8 characters, each between 1000 to 10,000 characters, wrote to a file to keep the same input across runs. import random
printable = tuple(c for c in (chr(i) for i in range(32, 0x110000)) if c.isprintable() and not c.isspace())
with open("lines.txt", "w", encoding="utf8") as f:
for _ in range(10000):
f.write("".join(random.choices(printable, k=random.randint(1000, 10000))))
f.write("\n") Used pyperf to call import pyperf
from markupsafe._native import escape as native_escape
from markupsafe._speedups import escape as speedups_escape
with open("lines.txt", encoding="utf8") as f:
lines = f.readlines()
runner = pyperf.Runner()
runner.timeit(
name="MarkupSafe native",
globals={"escape": native_escape, "lines": lines},
stmt="for line in lines: escape(line)",
)
runner.timeit(
name="MarkupSafe speedups",
globals={"escape": speedups_escape, "lines": lines},
stmt="for line in lines: escape(line)",
) |
@mattip just want to make sure you don't spot something I missed in the benchmark before I leave this closed. |
Thanks, I converted that to a self-contained gist and we are trying to reason about how to speed up the chain-of-replace on PyPy. It seems the JIT cannot do anything with the code, and so you see the pypy performance as if it has no JIT which is about 2x from CPython. The c code does something very different: it does all the replacing at once instead of chaining across the string 4 times. |
Yeah, I was very surprised at how slow the C code was though, I'm assuming that's the overhead of calling the C API? Anyway, thanks for continuing to look into this. If you find anything, maybe starting a new issue would be better, this one sort of got off topic. :-) |
It's the overhead of the C API, there's a lot that goes into pypy emulating CPython's C API, which slows down the language a lot |
On PyPy (specifically, the wheel with
_speedups.pypy36-pp73-x86_64-linux-gnu.so
), when usingmarkupsafe.escape()
, this unicode below is incorrectly escaped;Input:
'https://x?<ab c>&q"+%3D%2B"="fö%26=o"'
_native
:'https://x?<ab c>&q"+%3D%2B"="fö%26=o"'
_speedups
(on pypy):'https://x?<ab c>&q"+%3D%2B"="fö%26=o'
This was caught while testing PyPy support for synapse, in matrix-org/synapse#9123, in which one of the tests hooked behind here.
I am no expert in C, but I have a feeling one of the Unicode APIs used in
_speedups.c
is either being used incorrectly, or has an implementation mismatch compared to cPythonThe text was updated successfully, but these errors were encountered: