-
Notifications
You must be signed in to change notification settings - Fork 171
Implement some functions in AssemblyScript/WebAssembly #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
You could try speedup this a little use -O3
flag instead --optimize
(which -O2s actually). But I guess main reason why wasm version slower is interop overhead which could significant if your wasm function called quite frequently.
You could try this approach:
const threshold = 100; // adjust this function utf8ToUtf16String(bytes: Uint8Array) { if (bytes.length > threshold) { return wasmUtf8ToUtf16(...) } else { return jsUtf8ToUtf16(...) } }
@MaxGraey Thanks for your advice!
As I attached my benchmark script to the description, I am using a large target string like "Hello, π\n".repeat(1000)
. Even though such a situation with -O3
, AssemblyScript version much slower than JS:
$ npm run asbuild && USE_WASM=true npx ts-node benchmark/string.ts
> @msgpack/msgpack@1.0.1 asbuild /Users/gfx/.ghq/github.com/msgpack/msgpack-javascript
> npm run asbuild:untouched && npm run asbuild:optimized
> @msgpack/msgpack@1.0.1 asbuild:untouched /Users/gfx/.ghq/github.com/msgpack/msgpack-javascript
> asc assembly/index.ts -b build/wasm/untouched.wasm -t build/wasm/untouched.wat --sourceMap --validate --debug
> @msgpack/msgpack@1.0.1 asbuild:optimized /Users/gfx/.ghq/github.com/msgpack/msgpack-javascript
> asc assembly/index.ts -b build/wasm/optimized.wasm -t build/wasm/optimized.wat --sourceMap --validate -O3
encode: 1666.293ms
decode: 4684.249ms
$ npx ts-node benchmark/string.ts
encode: 1622.107ms
decode: 1234.643ms
(only decode() uses AssemblyScript ver.)
My environment: NodeJS v12.2.0 on macOS 10.14.4 (Mojave)
Do you have any idea?
Try to decrease iteration count from 1000
to 10-20
and increase input string size (repeat argument) from 1000
to 5000
and see will it change something
Tried:
diff --git a/benchmark/string.ts b/benchmark/string.ts index ebf0314..b47613d 100644 --- a/benchmark/string.ts +++ b/benchmark/string.ts @@ -1,6 +1,6 @@ import { encode, decode } from "../src"; -const data = "Hello, π\n".repeat(1000); +const data = "Hello, π\n".repeat(10_000); // warm up const encoded = encode(data); @@ -9,13 +9,13 @@ decode(encoded); // run console.time("encode"); -for (let i = 0; i < 10000; i++) { +for (let i = 0; i < 1000; i++) { encode(data); } console.timeEnd("encode"); console.time("decode"); -for (let i = 0; i < 10000; i++) { +for (let i = 0; i < 1000; i++) { decode(encoded); } console.timeEnd("decode");
But little changes:
$ USE_WASM=true npx ts-node benchmark/string.ts
encode: 1557.384ms
decode: 6345.717ms
$ npx ts-node benchmark/string.ts
encode: 1565.238ms
decode: 2784.192ms
Note: repeat(100_000)
causes Maximum call stack size exceeded
error.
MaxGraey
commented
May 15, 2019
Note: repeat(100_000) causes Maximum call stack size exceeded error.
Try to avoid using String.fromCharCode(...utf16array)
it slow and cause to SO when size of array >= 64k.
I think if you decrease 1000
-> 10
. AS should be the same speed as JS or faster. As I mentioned before you have JS <-> WASM interop overhead, which is good know performance issue for WebAssembly especially for v8 vm
assembly/index.ts
Outdated
@MaxGraey
MaxGraey
May 15, 2019
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it seems this should be:
let byte2: u16 = load<u8>(inputOffset++) & 0x3f; let byte3: u16 = load<u8>(inputOffset++) & 0x3f;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right: eea2f04
@MaxGraey
MaxGraey
May 15, 2019
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other hint which could potentially improve speed is using u32
instead u16
for every temporary variable.
Hmm? As far as I know, String.fromCharCode(...utf16array)
is the only way to create strings from ArrayBuffer or an array of codepoints. I'd like to know some other faster way if exists.
Right, but better use String.fromCharCode.apply
with fixed chunk size like in here
Interesting! I thought there's no difference on String.fromCharCode(...arrayLike)
and String.fromCharCode.apply(String, arrayLike)
and it is true if the arrayLike is the actual Array
and the JS engine supports ES2015, but #apply
is much faster if the arrayLike is a typed array.
Now WASM ver. is faster than JS ver. for a large string! dfe06c3
$ USE_WASM=true npx ts-node benchmark/string.ts
encode: 1547.967ms
decode: 1625.489ms
$ USE_WASM=false npx ts-node benchmark/string.ts
encode: 1526.350ms
decode: 2743.125ms
Now WASM ver. is a little bit faster than JS ver.
now wasm faster on ~60%. It's pretty good result. In comparison with JS 1.5-3x speedup usually achieved
Yep! not "a little bit", but 1.5x faster. Sounds great enough to proceed to merge this PR.
Codecov Report
@@ Coverage Diff @@ ## master #26 +/- ## ========================================== + Coverage 95.15% 96.37% +1.21% ========================================== Files 13 14 +1 Lines 681 744 +63 Branches 151 156 +5 ========================================== + Hits 648 717 +69 + Misses 13 12 -1 + Partials 20 15 -5
Continue to review full report at Codecov.
|
because it increses the size of bundle +5KiB (+base64-js dep)
I've implemented utf8EncodeWasm()
in AssemblyScript, but its performance is almost the same as pure JS ver., whilet the decode-in-wasm function is about 1.5x faster than JS ver. (NodeJS/v12.3.1 on macOS):
$ WASM=never npx ts-node benchmark/string.ts && echo && WASM=force npx ts-node benchmark/string.ts WASM_AVAILABLE=false encode / decode ascii data.length=40000 encoded.byteLength=40003 encode ascii: 542.870ms decode ascii: 868.292ms encode / decode emoji data.length=40000 encoded.byteLength=80005 encode emoji: 653.479ms decode emoji: 892.014ms WASM_AVAILABLE=true encode / decode ascii data.length=40000 encoded.byteLength=40003 encode ascii: 528.937ms decode ascii: 520.596ms encode / decode emoji data.length=40000 encoded.byteLength=80005 encode emoji: 555.948ms decode emoji: 595.192ms
I think the overhead of preparation that converts string to an array of utf16 units (i.e. setMemoryStr()
in wasmFunctions.ts
) is too heavy, but I have no idea to get it better.
Do you have any idea about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess better allocate this inside utf8EncodeUint16Array
on AS side. This reduce overhead for one js <-> wasm call. But you should retreave this pointer somehow anyway and this required do other call like getOutputPointer
. Hmm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to reduce a malloc()
call there but the performance score didn't get better.
diff --git a/assembly/utf8EncodeUint16Array.ts b/assembly/utf8EncodeUint16Array.ts index 195991f..536311c 100644 --- a/assembly/utf8EncodeUint16Array.ts +++ b/assembly/utf8EncodeUint16Array.ts @@ -29,9 +29,15 @@ function storeStringHeader(outputPtr: usize, utf8ByteLength: usize): usize { // outputPtr: u8* // inputPtr: u16* // It adds MessagePack str head bytes to the output -export function utf8EncodeUint16Array(outputPtr: usize, inputPtr: usize, inputLength: usize): usize { +export function utf8EncodeUint16Array(inputPtr: usize, inputLength: usize): usize { + const maxOutputHeaderSize = sizeof<u8>() + sizeof<u32>(); + + // outputPtr: [u32 outputBufferSize][outputBuffer] + let outputPtr = memory.allocate(maxOutputHeaderSize + inputLength * 4); + let outputPtrStart = outputPtr + sizeof<u32>(); + let utf8ByteLength = utf8CountUint16Array(inputPtr, inputLength); - let strHeaderOffset = storeStringHeader(outputPtr, utf8ByteLength); + let strHeaderOffset = storeStringHeader(outputPtrStart, utf8ByteLength); const u16s = sizeof<u16>(); let inputOffset = inputPtr; @@ -76,5 +82,7 @@ export function utf8EncodeUint16Array(outputPtr: usize, inputPtr: usize, inputLe store<u8>(outputOffset++, (value & 0x3f) | 0x80); } - return outputOffset - outputPtr; + let outputByteLength = outputOffset - outputPtrStart; + storeUint32BE(outputPtr, outputByteLength); + return outputPtr; } diff --git a/src/wasmFunctions.ts b/src/wasmFunctions.ts index 5a3c44a..3a6ab4b 100644 --- a/src/wasmFunctions.ts +++ b/src/wasmFunctions.ts @@ -54,9 +54,11 @@ export function utf8EncodeWasm(str: string, output: Uint8Array): number { const maxOutputHeaderSize = 1 + 4; // headByte + u32 const outputPtr: pointer = wm.malloc(maxOutputHeaderSize + strLength * 4); try { - const outputLength = wm.utf8EncodeUint16Array(outputPtr, inputU16BePtr, strLength); - output.set(new Uint8Array(wm.memory.buffer, outputPtr, outputLength)); - return outputLength; + const outputPtr = wm.utf8EncodeUint16Array(inputU16BePtr, strLength); + // the first 4 bytes is the outputByteLength in big-endian + const outputByteLength = new DataView(wm.memory.buffer, outputPtr, 4).getUint32(0); + output.set(new Uint8Array(wm.memory.buffer, outputPtr + 4, outputByteLength)); + return outputByteLength; } finally { wm.free(inputU16BePtr); wm.free(outputPtr);
I guess calling malloc()
is not so heavy.
@MaxGraey
MaxGraey
May 24, 2019
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, in this case I recommend profile this better. Try manually measure (for example via console.time/console.timeEnd) for whole js utf8EncodeUint16Array
method and for wm.utf8EncodeUint16Array
call exclusively. Diff between times give you picture about which is price you pay for interop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to merge multiple malloc into a single one, but there were no effects.
Will try reference types https://github.com/WebAssembly/reference-types/blob/master/proposals/reference-types/Overview.md in a future.
Definitely you have interop overhead. Hard to tell how this improve. Try eliminate calls between JS and wasm. For example: use one shared buffer (with max size of both) for input and output and use inplace process if this possible
Hmm! Will give it a try to reduce some js<->wasm calls.
Thank you for your advice, anyway! I'll merge this PR in a few days and will send feedbacks to AssemblyScript.
MaxGraey
commented
May 24, 2019
@gfx Sure! It will be great
Uh oh!
There was an error while loading. Please reload this page.
(ει€) Currently, it is much slower than JS version π€ (ει€γγγΎγ§)Thanks to @MaxGraey, now it's 1.5x faster than JS ver. for large string chunks. Good enough to proceed to merge this PR.
The benchmark script I am using (
benchmark/string.ts
):