Implement some functions in AssemblyScript/WebAssembly #26

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

gfx merged 28 commits into master from assemblyscript

May 25, 2019

Merged

Implement some functions in AssemblyScript/WebAssembly #26

gfx merged 28 commits into master from assemblyscript

May 25, 2019

Conversation

gfx

Copy link

Member

@gfx gfx commented May 14, 2019 •

edited

Loading

~~(削除) Currently, it is much slower than JS version 🤔 (削除ここまで)~~

Thanks to @MaxGraey, now it's 1.5x faster than JS ver. for large string chunks. Good enough to proceed to merge this PR.

The benchmark script I am using (benchmark/string.ts):

import { encode, decode } from "../src";
const data = "Hello, 🌏\n".repeat(1000);
// warm up
const encoded = encode(data);
decode(encoded);
// run
console.time("encode");
for (let i = 0; i < 10000; i++) {
 encode(data);
}
console.timeEnd("encode");
console.time("decode");
for (let i = 0; i < 10000; i++) {
 decode(encoded);
}
console.timeEnd("decode");

@MaxGraey

Copy link

MaxGraey commented May 14, 2019 •

edited

Loading

You could try speedup this a little use -O3 flag instead --optimize (which -O2s actually). But I guess main reason why wasm version slower is interop overhead which could significant if your wasm function called quite frequently.

You could try this approach:

const threshold = 100; // adjust this
function utf8ToUtf16String(bytes: Uint8Array) {
 if (bytes.length > threshold) {
 return wasmUtf8ToUtf16(...)
 } else {
 return jsUtf8ToUtf16(...)
 }
}

@gfx

Copy link

Member Author

gfx commented May 15, 2019 •

edited

Loading

@MaxGraey Thanks for your advice!

As I attached my benchmark script to the description, I am using a large target string like "Hello, 🌏\n".repeat(1000). Even though such a situation with -O3, AssemblyScript version much slower than JS:

$ npm run asbuild && USE_WASM=true npx ts-node benchmark/string.ts
> @msgpack/msgpack@1.0.1 asbuild /Users/gfx/.ghq/github.com/msgpack/msgpack-javascript
> npm run asbuild:untouched && npm run asbuild:optimized
> @msgpack/msgpack@1.0.1 asbuild:untouched /Users/gfx/.ghq/github.com/msgpack/msgpack-javascript
> asc assembly/index.ts -b build/wasm/untouched.wasm -t build/wasm/untouched.wat --sourceMap --validate --debug
> @msgpack/msgpack@1.0.1 asbuild:optimized /Users/gfx/.ghq/github.com/msgpack/msgpack-javascript
> asc assembly/index.ts -b build/wasm/optimized.wasm -t build/wasm/optimized.wat --sourceMap --validate -O3
encode: 1666.293ms
decode: 4684.249ms
$ npx ts-node benchmark/string.ts
encode: 1622.107ms
decode: 1234.643ms

(only decode() uses AssemblyScript ver.)

My environment: NodeJS v12.2.0 on macOS 10.14.4 (Mojave)

Do you have any idea?

@MaxGraey

Copy link

MaxGraey commented May 15, 2019 •

edited

Loading

Try to decrease iteration count from 1000 to 10-20 and increase input string size (repeat argument) from 1000 to 5000 and see will it change something

@gfx

Copy link

Member Author

gfx commented May 15, 2019

Tried:

diff --git a/benchmark/string.ts b/benchmark/string.ts
index ebf0314..b47613d 100644
--- a/benchmark/string.ts
+++ b/benchmark/string.ts
@@ -1,6 +1,6 @@
 import { encode, decode } from "../src";
 
-const data = "Hello, 🌏\n".repeat(1000);
+const data = "Hello, 🌏\n".repeat(10_000);
 
 // warm up
 const encoded = encode(data);
@@ -9,13 +9,13 @@ decode(encoded);
 // run
 
 console.time("encode");
-for (let i = 0; i < 10000; i++) {
+for (let i = 0; i < 1000; i++) {
 encode(data);
 }
 console.timeEnd("encode");
 
 console.time("decode");
-for (let i = 0; i < 10000; i++) {
+for (let i = 0; i < 1000; i++) {
 decode(encoded);
 }
 console.timeEnd("decode");

But little changes:

$ USE_WASM=true npx ts-node benchmark/string.ts
encode: 1557.384ms
decode: 6345.717ms
$ npx ts-node benchmark/string.ts
encode: 1565.238ms
decode: 2784.192ms

Note: repeat(100_000) causes Maximum call stack size exceeded error.

@MaxGraey

Copy link

MaxGraey commented May 15, 2019

Note: repeat(100_000) causes Maximum call stack size exceeded error.

Try to avoid using String.fromCharCode(...utf16array) it slow and cause to SO when size of array >= 64k.

I think if you decrease 1000 -> 10. AS should be the same speed as JS or faster. As I mentioned before you have JS <-> WASM interop overhead, which is good know performance issue for WebAssembly especially for v8 vm

MaxGraey

MaxGraey reviewed

May 15, 2019

View reviewed changes

assembly/index.ts Outdated

} else if ((byte1 & 0xf0) === 0xe0) {

// 3 bytes

let byte2: u16 = load<u16>(inputOffset++) & 0x3f;

let byte3: u16 = load<u16>(inputOffset++) & 0x3f;

Copy link

@MaxGraey MaxGraey May 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it seems this should be:

 let byte2: u16 = load<u8>(inputOffset++) & 0x3f;
 let byte3: u16 = load<u8>(inputOffset++) & 0x3f;

Copy link

Member Author

@gfx gfx May 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right: eea2f04

Copy link

@MaxGraey MaxGraey May 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other hint which could potentially improve speed is using u32 instead u16 for every temporary variable.

@gfx

Copy link

Member Author

gfx commented May 15, 2019

Hmm? As far as I know, String.fromCharCode(...utf16array) is the only way to create strings from ArrayBuffer or an array of codepoints. I'd like to know some other faster way if exists.

@MaxGraey

Copy link

MaxGraey commented May 15, 2019 •

edited

Loading

Right, but better use String.fromCharCode.apply with fixed chunk size like in here

@gfx

Copy link

Member Author

gfx commented May 15, 2019 •

edited

Loading

Interesting! I thought there's no difference on String.fromCharCode(...arrayLike) and String.fromCharCode.apply(String, arrayLike) and it is true if the arrayLike is the actual Array and the JS engine supports ES2015, but #apply is much faster if the arrayLike is a typed array.

Now WASM ver. is faster than JS ver. for a large string! dfe06c3

$ USE_WASM=true npx ts-node benchmark/string.ts
encode: 1547.967ms
decode: 1625.489ms
$ USE_WASM=false npx ts-node benchmark/string.ts
encode: 1526.350ms
decode: 2743.125ms

@MaxGraey

Copy link

MaxGraey commented May 15, 2019 •

edited

Loading

Now WASM ver. is a little bit faster than JS ver.

now wasm faster on ~60%. It's pretty good result. In comparison with JS 1.5-3x speedup usually achieved

@gfx

Copy link

Member Author

gfx commented May 15, 2019 •

edited

Loading

Yep! not "a little bit", but 1.5x faster. Sounds great enough to proceed to merge this PR.

@codecov-io

Copy link

codecov-io commented May 16, 2019 •

edited

Loading

Codecov Report

Merging #26 into master will increase coverage by 1.21%.
The diff coverage is 93.91%.

Impacted file tree graph

@@ Coverage Diff @@
## master #26 +/- ##
==========================================
+ Coverage 95.15% 96.37% +1.21% 
==========================================
 Files 13 14 +1 
 Lines 681 744 +63 
 Branches 151 156 +5 
==========================================
+ Hits 648 717 +69 
+ Misses 13 12 -1 
+ Partials 20 15 -5

Impacted Files	Coverage Δ
src/Encoder.ts	`94.89% <100%> (+0.3%)`	⬆️
src/Decoder.ts	`99.26% <100%> (+0.02%)`	⬆️
src/wasmFunctions.ts	`90.47% <90.47%> (ø)`
src/utils/utf8.ts	`93.82% <93.47%> (+12.81%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b8453db...b8025a5. Read the comment docs.

gfx added 7 commits

May 16, 2019 10:20

@gfx


 initial commit after asinit .

f2779b6

@gfx


 implement utf8Encode in AssemblyScript (but slow)

80a1bd4

@gfx


 asbuild -O3; add benchmark; USE_WASM=true

be528f3

@gfx


 fix misuse of loaad<T>()

d9bf5af

@gfx


 use String.fromCharCode.apply() in WASM ver.

a4903ec

@gfx


 re-structured wasm modules

6c55452

@gfx


 move use of utf8DecodeWasm to utf8.ts

@gfx gfx force-pushed the assemblyscript branch from 9cdac6f to 6066940 Compare

May 16, 2019 01:39

gfx added 10 commits

May 16, 2019 16:25

@gfx


 use memory allocator in wasm functions

8a663c4

@gfx


 refactor dist structure

01fd626

@gfx


 assumes process.env is always available

@gfx


 implement utf8CountWasm()

e36f068

@gfx


 implement the whole string encoder in wasm

c107406

@gfx


 tiny optimization for utf8 encode/count

a1dd4f4

@gfx


 simulate wasm + esm interface in pack-wasm.ts

66f2a78

@gfx


 benchmark tweaks

0d6b2f4

@gfx


 tweaks for WASM performance

@gfx


 remove dead code

ede703d

gfx added 8 commits

May 23, 2019 12:17

@gfx


 cleanup comments

f3bca07

@gfx


 tweaks; @inline is redundant in -O3

aaf70e2

@gfx


 do not include wasm in the default bundle js

8be8c5b

because it increses the size of bundle +5KiB (+base64-js dep)

@gfx


 Merge remote-tracking branch 'origin/master' into assemblyscript

78ceac5

@gfx


 fix large string decode issues

9a3c7bf

@gfx


 fix karma config because webpack.config.js now returns array of configs

12fecca

@gfx


 add test for broken code unit (e.g. \xff)

c3c62bd

@gfx


 do not drop lone surrogate

5e631cc

@gfx gfx changed the title ~~(削除) [WIP] Experimentally implement some functions in AssemblyScript (削除ここまで)~~ (追記) Experimentally implement some functions in AssemblyScript (追記ここまで)

May 23, 2019

@gfx

Copy link

Member Author

gfx commented May 24, 2019 •

edited

Loading

@MaxGraey

I've implemented utf8EncodeWasm() in AssemblyScript, but its performance is almost the same as pure JS ver., whilet the decode-in-wasm function is about 1.5x faster than JS ver. (NodeJS/v12.3.1 on macOS):

$ WASM=never npx ts-node benchmark/string.ts && echo && WASM=force npx ts-node benchmark/string.ts
WASM_AVAILABLE=false
encode / decode ascii data.length=40000 encoded.byteLength=40003
encode ascii: 542.870ms
decode ascii: 868.292ms
encode / decode emoji data.length=40000 encoded.byteLength=80005
encode emoji: 653.479ms
decode emoji: 892.014ms
WASM_AVAILABLE=true
encode / decode ascii data.length=40000 encoded.byteLength=40003
encode ascii: 528.937ms
decode ascii: 520.596ms
encode / decode emoji data.length=40000 encoded.byteLength=80005
encode emoji: 555.948ms
decode emoji: 595.192ms

I think the overhead of preparation that converts string to an array of utf16 units (i.e. setMemoryStr() in wasmFunctions.ts) is too heavy, but I have no idea to get it better.

Do you have any idea about it?

@gfx


 coverage report for both test:purejs and test:wasm

4e058db

MaxGraey

MaxGraey reviewed

May 24, 2019

View reviewed changes

src/wasmFunctions.ts

setMemoryStr(inputU16BePtr, inputByteLength, str, strLength);

const maxOutputHeaderSize = 1 + 4; // headByte + u32

const outputPtr: pointer = wm.malloc(maxOutputHeaderSize + strLength * 4);

Copy link

@MaxGraey MaxGraey May 24, 2019 •

edited

Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess better allocate this inside utf8EncodeUint16Array on AS side. This reduce overhead for one js <-> wasm call. But you should retreave this pointer somehow anyway and this required do other call like getOutputPointer. Hmm

Copy link

Member Author

@gfx gfx May 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to reduce a malloc() call there but the performance score didn't get better.

diff --git a/assembly/utf8EncodeUint16Array.ts b/assembly/utf8EncodeUint16Array.ts
index 195991f..536311c 100644
--- a/assembly/utf8EncodeUint16Array.ts
+++ b/assembly/utf8EncodeUint16Array.ts
@@ -29,9 +29,15 @@ function storeStringHeader(outputPtr: usize, utf8ByteLength: usize): usize {
 // outputPtr: u8*
 // inputPtr: u16*
 // It adds MessagePack str head bytes to the output
-export function utf8EncodeUint16Array(outputPtr: usize, inputPtr: usize, inputLength: usize): usize {
+export function utf8EncodeUint16Array(inputPtr: usize, inputLength: usize): usize {
+ const maxOutputHeaderSize = sizeof<u8>() + sizeof<u32>();
+
+ // outputPtr: [u32 outputBufferSize][outputBuffer]
+ let outputPtr = memory.allocate(maxOutputHeaderSize + inputLength * 4);
+ let outputPtrStart = outputPtr + sizeof<u32>();
+
 let utf8ByteLength = utf8CountUint16Array(inputPtr, inputLength);
- let strHeaderOffset = storeStringHeader(outputPtr, utf8ByteLength);
+ let strHeaderOffset = storeStringHeader(outputPtrStart, utf8ByteLength);
 
 const u16s = sizeof<u16>();
 let inputOffset = inputPtr;
@@ -76,5 +82,7 @@ export function utf8EncodeUint16Array(outputPtr: usize, inputPtr: usize, inputLe
 store<u8>(outputOffset++, (value & 0x3f) | 0x80);
 }
 
- return outputOffset - outputPtr;
+ let outputByteLength = outputOffset - outputPtrStart;
+ storeUint32BE(outputPtr, outputByteLength);
+ return outputPtr;
 }
diff --git a/src/wasmFunctions.ts b/src/wasmFunctions.ts
index 5a3c44a..3a6ab4b 100644
--- a/src/wasmFunctions.ts
+++ b/src/wasmFunctions.ts
@@ -54,9 +54,11 @@ export function utf8EncodeWasm(str: string, output: Uint8Array): number {
 const maxOutputHeaderSize = 1 + 4; // headByte + u32
 const outputPtr: pointer = wm.malloc(maxOutputHeaderSize + strLength * 4);
 try {
- const outputLength = wm.utf8EncodeUint16Array(outputPtr, inputU16BePtr, strLength);
- output.set(new Uint8Array(wm.memory.buffer, outputPtr, outputLength));
- return outputLength;
+ const outputPtr = wm.utf8EncodeUint16Array(inputU16BePtr, strLength);
+ // the first 4 bytes is the outputByteLength in big-endian
+ const outputByteLength = new DataView(wm.memory.buffer, outputPtr, 4).getUint32(0);
+ output.set(new Uint8Array(wm.memory.buffer, outputPtr + 4, outputByteLength));
+ return outputByteLength;
 } finally {
 wm.free(inputU16BePtr);
 wm.free(outputPtr);

I guess calling malloc() is not so heavy.

Copy link

@MaxGraey MaxGraey May 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, in this case I recommend profile this better. Try manually measure (for example via console.time/console.timeEnd) for whole js utf8EncodeUint16Array method and for wm.utf8EncodeUint16Array call exclusively. Diff between times give you picture about which is price you pay for interop

Copy link

Member Author

@gfx gfx May 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to merge multiple malloc into a single one, but there were no effects.

Will try reference types https://github.com/WebAssembly/reference-types/blob/master/proposals/reference-types/Overview.md in a future.

@MaxGraey

Copy link

MaxGraey commented May 24, 2019 •

edited

Loading

Definitely you have interop overhead. Hard to tell how this improve. Try eliminate calls between JS and wasm. For example: use one shared buffer (with max size of both) for input and output and use inplace process if this possible

@gfx

Copy link

Member Author

gfx commented May 24, 2019

Hmm! Will give it a try to reduce some js<->wasm calls.

Thank you for your advice, anyway! I'll merge this PR in a few days and will send feedbacks to AssemblyScript.

@MaxGraey

Copy link

MaxGraey commented May 24, 2019

@gfx Sure! It will be great

gfx added 2 commits

May 25, 2019 10:50

@gfx


 set --forceConsistentCasingInFileNames

ebe391a

@gfx


 test: make mocha.timeout longer (5 sec. to 10 sec.)

b8025a5

@gfx gfx merged commit 3ed980b into master

May 25, 2019

@gfx gfx deleted the assemblyscript branch

May 25, 2019 05:17

@gfx gfx changed the title ~~(削除) Experimentally implement some functions in AssemblyScript (削除ここまで)~~ (追記) Implement some functions in AssemblyScript (追記ここまで)

May 25, 2019

@gfx gfx changed the title ~~(削除) Implement some functions in AssemblyScript (削除ここまで)~~ (追記) Implement some functions in AssemblyScript/WebAssembly (追記ここまで)

May 25, 2019

@gfx gfx mentioned this pull request

Dec 27, 2019

remove WebAssembly stuff because it is not much faster than pureJs #95

Merged

Labels

None yet

3 participants

@gfx @MaxGraey @codecov-io

Implement some functions in AssemblyScript/WebAssembly #26

Implement some functions in AssemblyScript/WebAssembly #26

Uh oh!

Conversation

@gfx gfx commented May 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaxGraey commented May 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gfx commented May 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaxGraey commented May 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gfx commented May 15, 2019

Uh oh!

MaxGraey commented May 15, 2019

Uh oh!

@MaxGraey MaxGraey May 15, 2019

Choose a reason for hiding this comment

Uh oh!

@gfx gfx May 15, 2019

Choose a reason for hiding this comment

Uh oh!

@MaxGraey MaxGraey May 15, 2019

Choose a reason for hiding this comment

Uh oh!

gfx commented May 15, 2019

Uh oh!

MaxGraey commented May 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gfx commented May 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaxGraey commented May 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gfx commented May 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-io commented May 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gfx commented May 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

@MaxGraey MaxGraey May 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

@gfx gfx May 24, 2019

Choose a reason for hiding this comment

Uh oh!

@MaxGraey MaxGraey May 24, 2019

Choose a reason for hiding this comment

Uh oh!

@gfx gfx May 25, 2019

Choose a reason for hiding this comment

Uh oh!

MaxGraey commented May 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gfx commented May 24, 2019

Uh oh!

MaxGraey commented May 24, 2019

Uh oh!

Uh oh!

@gfx gfx commented May 14, 2019 •

edited

Loading

MaxGraey commented May 14, 2019 •

edited

Loading

gfx commented May 15, 2019 •

edited

Loading

MaxGraey commented May 15, 2019 •

edited

Loading

MaxGraey commented May 15, 2019 •

edited

Loading

gfx commented May 15, 2019 •

edited

Loading

MaxGraey commented May 15, 2019 •

edited

Loading

gfx commented May 15, 2019 •

edited

Loading

codecov-io commented May 16, 2019 •

edited

Loading

gfx commented May 24, 2019 •

edited

Loading

@MaxGraey MaxGraey May 24, 2019 •

edited

Loading

MaxGraey commented May 24, 2019 •

edited

Loading