Use WebAssembly bulk memory opcodes #263

TerrorJack · 2021-12-04T03:38:01Z

Closes #262.

TerrorJack · 2021-12-04T04:04:50Z

Ready for first pass of review.

Haven't made microbenchmarks to gather statistics yet, but IIUIC the only scenario where a wasm loop can be faster than native implementation might be: frequent calls to operate on very small chunks. Some engines may have overhead to call native methods, who knows.

TerrorJack · 2021-12-04T04:31:16Z

I also eyeballed the wasm2wat CI artifacts, compiled with LLVM 10 & 13. memcpy, memmove, memset all correctly use bulk memory opcodes, but when I attempted to use __builtin_wmemcpy, LLVM 10 emits:

(module
  (type (;0;) (func (param i32 i32 i32) (result i32)))
  (import "env" "__linear_memory" (memory (;0;) 0))
  (import "env" "__indirect_function_table" (table (;0;) 0 funcref))
  (func $wmemcpy (type 0) (param i32 i32 i32) (result i32)
    (local i32)
    local.get 3))

And LLVM 13 emits:

(module
  (type (;0;) (func (param i32 i32 i32) (result i32)))
  (import "env" "__linear_memory" (memory (;0;) 0))
  (func $wmemcpy (type 0) (param i32 i32 i32) (result i32)
    loop (result i32)  ;; label = @1
      br 0 (;@1;)
    end))

So the builtin intrinsics doesn't work for wide char versions of the functions. Might be an upstream bug.

Makefile

sbc100 · 2021-12-11T17:18:50Z

Might be worth measure that actual performs in shipping runtimes as @kripken suggests in the bug.. otherwise LGTM

sunfishcode · 2021-12-14T16:59:26Z

Benchmarks would be nice to do, though I'm inclined to merge this either way. LLVM optimizes small fixed-size memcpy/etc. into inline loads and stores already, and beyond that, this is one of the main use cases that bulk-memory was added for, so in theory it shouldn't be slow anywhere. If it does turn out to be slow somewhere, and it isn't just a missing optimization in a particular engine, we can of course revisit this.

kripken · 2021-12-14T18:16:06Z

Just as a warning, the benchmarks I saw were quite slow, so you might be making wasi-libc 2x slower or so on commonly-called functions. There was also debate back then as to whether this was an intended use of bulk memory or not (i.e. should the VM emit fast code for both small and large operations, both aligned and unaligned, etc.). All that was a while ago though, so hopefully it's not a problem any more!

kripken · 2022-01-05T23:44:16Z

Link to some recent discussion and data on this (with no clear conclusions):

WebAssembly/binaryen#4403

sunfishcode · 2022-04-13T21:06:04Z

I've also now done some benchmarking and instrumentation, and these small unaligned memcpys are more frequent than I had guessed they'd be. For example, they come up when doing formatted I/O, to copy all the individual strings into the I/O buffer.

@TerrorJack Would you be interested in extending this PR to have fast paths for small lengths?

TerrorJack · 2022-05-18T13:15:01Z

@sunfishcode Hi, of course, but I'd like to know how to obtain the heuristic small length first.

sunfishcode · 2022-05-18T18:55:35Z

@TerrorJack Going by this data I think the optimal number will be somewhere between 8 and 100. Perhaps we could try starting with 32? That's long enough to easily cover most formatted-I/O use cases, but perhaps short enough to keep the code simple?

TerrorJack · 2022-05-18T19:20:53Z

@sunfishcode Done.

sunfishcode · 2022-05-20T20:39:39Z

Thanks! We can experiment with the threshold and see how it works out in practice.

kripken · 2022-07-21T22:28:29Z

Btw, it might be good to document this somewhere (not sure where though?). I saw someone run into a problem because they used wasi-libc and the code couldn't run in a VM since the VM didn't support bulk memory. The error message "invalid section 12" doesn't really help point people to rebuilding wasi-libc, unforuntately...

sunfishcode · 2022-07-25T20:56:04Z

If you have any ideas where we could document this such that a user would find it when they need to, I'm open to documenting this.

TerrorJack marked this pull request as ready for review December 4, 2021 03:59

sbc100 reviewed Dec 4, 2021

View reviewed changes

Makefile Show resolved Hide resolved

TerrorJack requested a review from sbc100 December 11, 2021 15:05

sbc100 approved these changes Dec 11, 2021

View reviewed changes

TerrorJack added 6 commits May 18, 2022 19:08

-mbulk-memory go brrr

6855d4b

Fix test artifacts

3abd2ab

Use bulk memory opcodes when possible

38767df

Don't use __builtin version of wmem*

757b8d9

Only apply -mbulk-memory to specific files

72c0cc1

Add TODO pointing to LLVM bug report

7a43461

TerrorJack force-pushed the bulk-memory branch from 3e12e21 to 1ac78bd Compare May 18, 2022 19:19

Implement BULK_MEMORY_THRESHOLD

0afd9cb

TerrorJack force-pushed the bulk-memory branch from 1ac78bd to 0afd9cb Compare May 18, 2022 19:44

sunfishcode merged commit ba81b40 into WebAssembly:main May 20, 2022

TerrorJack deleted the bulk-memory branch May 20, 2022 20:58

sbc100 mentioned this pull request Mar 9, 2023

fix: disable bulk memory feature when BULK_MEMORY_THRESHOLD is UINT32_MAX #395

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use WebAssembly bulk memory opcodes #263

Use WebAssembly bulk memory opcodes #263

TerrorJack commented Dec 4, 2021

TerrorJack commented Dec 4, 2021

TerrorJack commented Dec 4, 2021

sbc100 commented Dec 11, 2021

sunfishcode commented Dec 14, 2021

kripken commented Dec 14, 2021

kripken commented Jan 5, 2022 •

edited

Loading

sunfishcode commented Apr 13, 2022

TerrorJack commented May 18, 2022

sunfishcode commented May 18, 2022

TerrorJack commented May 18, 2022

sunfishcode commented May 20, 2022

kripken commented Jul 21, 2022

sunfishcode commented Jul 25, 2022

Use WebAssembly bulk memory opcodes #263

Use WebAssembly bulk memory opcodes #263

Conversation

TerrorJack commented Dec 4, 2021

TerrorJack commented Dec 4, 2021

TerrorJack commented Dec 4, 2021

sbc100 commented Dec 11, 2021

sunfishcode commented Dec 14, 2021

kripken commented Dec 14, 2021

kripken commented Jan 5, 2022 • edited Loading

sunfishcode commented Apr 13, 2022

TerrorJack commented May 18, 2022

sunfishcode commented May 18, 2022

TerrorJack commented May 18, 2022

sunfishcode commented May 20, 2022

kripken commented Jul 21, 2022

sunfishcode commented Jul 25, 2022

kripken commented Jan 5, 2022 •

edited

Loading