Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(ws): optimize fastwebsockets in release profile #24277

Merged
merged 4 commits into from
Jun 20, 2024

Conversation

lucab
Copy link
Contributor

@lucab lucab commented Jun 19, 2024

This sorts and re-aligns the bench and release profiles, notably enabling optimizations for fastwebsockets in the released CLI binary.
The goal is to unlock auto-vectorization for ws hotpaths, as noted in https://github.com/denoland/fastwebsockets/blob/0.7.0/src/mask.rs#L22 (contrarily to that comment, disassembling the amd64 deno 1.44 binary shows no SIMD instructions in ws-unmasking logic).

@littledivy
Copy link
Member

(contrarily to that comment, disassembling the amd64 deno 1.44 binary shows no SIMD instructions in ws-unmasking logic)

oh interesting 🤔 we should probably re-enable explicit SIMD then

Copy link
Member

@littledivy littledivy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@littledivy littledivy merged commit 2cfaee0 into denoland:main Jun 20, 2024
17 checks passed
@lucab
Copy link
Contributor Author

lucab commented Jun 20, 2024

we should probably re-enable explicit SIMD then

I think you don't have to.
On my local 1.79 rustc, auto-vectorization on amd64 seems to kick in at opt-level >= 2.
And after this PR, on the latest canary from CI, I can indeed see that the ws hotpath is now using SIMD instructions for unmasking:

00000000056ecc10 <_ZN14fastwebsockets5frame5Frame6unmask17h1730290b8ecf1d1cE>:
 56ecc10:       55                      push   %rbp
 56ecc11:       41 57                   push   %r15
[...]
 56ecd78:       49 ff c1                inc    %r9
 56ecd7b:       4d 21 cd                and    %r9,%r13
 56ecd7e:       4a 8d 0c ae             lea    (%rsi,%r13,4),%rcx
 56ecd82:       66 41 0f 6e c0          movd   %r8d,%xmm0
 56ecd87:       66 0f 70 c0 00          pshufd $0x0,%xmm0,%xmm0
 56ecd8c:       45 31 d2                xor    %r10d,%r10d
 56ecd8f:       90                      nop
 56ecd90:       f3 42 0f 6f 0c 96       movdqu (%rsi,%r10,4),%xmm1
 56ecd96:       f3 42 0f 6f 54 96 10    movdqu 0x10(%rsi,%r10,4),%xmm2
 56ecd9d:       66 0f ef c8             pxor   %xmm0,%xmm1
 56ecda1:       66 0f ef d0             pxor   %xmm0,%xmm2
 56ecda5:       f3 42 0f 7f 0c 96       movdqu %xmm1,(%rsi,%r10,4)
 56ecdab:       f3 42 0f 7f 54 96 10    movdqu %xmm2,0x10(%rsi,%r10,4)
 56ecdb2:       49 83 c2 08             add    $0x8,%r10
 56ecdb6:       4d 39 d5                cmp    %r10,%r13
 56ecdb9:       75 d5                   jne    56ecd90 <_ZN14fastwebsockets5frame5Frame6unmask17h1730290b8ecf1d1cE+0x180>
[...]

@lucab lucab deleted the ups/cargo-profiles branch June 20, 2024 07:07
sbmsr pushed a commit to sbmsr/deno-1 that referenced this pull request Jul 2, 2024
zebreus pushed a commit to zebreus/deno that referenced this pull request Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants