Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Cached string decoder for map keys (CachedKeyDecoder) #54

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
gfx merged 9 commits into master from cached_key_string_decoder
Aug 2, 2019

Conversation

Copy link
Collaborator

@sergeyzenchenko sergeyzenchenko commented Jun 3, 2019
edited
Loading

Check this out @gfx We are using it in our project.

Big part of all strings are map keys and most of the time total number of uniq keys is small.
For example in our payload in 260k objects 67% of all string are keys with 100 uniq values.
I've tried to cache this values and skip string decoding at all for keys.
It's using bytes representation of key string and stores actual key value. So instead of decoding key string we just need to find this key value in cache.

Results before:

Benchmark on NodeJS/v12.3.1
operation | op | ms | op/s 
----------------------------------------------------------------- | ------: | ----: | ------:
buf = Buffer.from(JSON.stringify(obj)); | 557200 | 5000 | 111440
buf = JSON.stringify(obj); | 1078100 | 5000 | 215620
obj = JSON.parse(buf); | 394300 | 5001 | 78844
buf = require("msgpack-lite").encode(obj); | 416400 | 5000 | 83280
obj = require("msgpack-lite").decode(buf); | 313600 | 5000 | 62720
buf = require("@msgpack/msgpack").encode(obj); | 646100 | 5000 | 129220
obj = require("@msgpack/msgpack").decode(buf); | 561800 | 5000 | 112360
✨ Done in 36.69s.

After:

Benchmark on NodeJS/v12.3.1
operation | op | ms | op/s 
----------------------------------------------------------------- | ------: | ----: | ------:
buf = Buffer.from(JSON.stringify(obj)); | 598900 | 5000 | 119780
buf = JSON.stringify(obj); | 1151000 | 5000 | 230200
obj = JSON.parse(buf); | 397600 | 5000 | 79520
buf = require("msgpack-lite").encode(obj); | 429000 | 5000 | 85800
obj = require("msgpack-lite").decode(buf); | 323800 | 5000 | 64760
buf = require("@msgpack/msgpack").encode(obj); | 666800 | 5000 | 133360
obj = require("@msgpack/msgpack").decode(buf); | 744000 | 5000 | 148800
✨ Done in 36.95s.

I am not sure that we should include it directly into this library, but maybe it can be added as optional speedup that user can add if it's required.

It works better if Decoder is reused, because cache is need to be created first.

Let me know what do you think about it.

Copy link
Member

gfx commented Jun 3, 2019
edited
Loading

Fascinating idea! 👏 👏 👏

I'll merge the string caching in decoders, but there are some improvement and changes needed.

The reuse of Decoder

If one would like to decode it in the fastest way, they should reuse the instance of Decoder, anyway. We should re-write the benchmark script to reuse Decoder.

And, I think decode() should create Decoder for each time in order to control configuration easily.

Default behavior

I'd like to use caching by default, so it should be merged to Decoder.

Configuration of cacheSize: number may be a good idea to control memory usage because uncontrolled caching is a kind of memory leak.

Copy link
Collaborator Author

Sure, this is just a prototype solution that we use, it's not intended to be merged directly. It requires changes and tests.

I think we can allow user to pass string decoder to options, so user can pass custom decoder if needed.

What do you think about it?

Copy link
Collaborator Author

I am not sure about how to properly implement cacheSize yet.
In worst case scenario cacheSize can be filled with low frequently used keys if they are frequently used in first documents.

Maybe I can add cache.compact() function that user can call on demand.
Or this can be called periodically inside Decoder.
This function will remove less frequently used keys.

Current version tracks number of hits for each key and orders them by this number to make most frequently used keys faster.

Copy link
Collaborator Author

Lol @gfx , I've just improved performance more :D Now it's 176660 ops/s

Copy link
Member

gfx commented Jun 4, 2019

Lol @gfx , I've just improved performance more :D Now it's 176660 ops/s

Amazing! 😆


cacheSize is not necessarily correct; it's just a hint.


I think we can allow user to pass string decoder to options, so user can pass custom decoder if needed

You mean decode(buffer, { stringDecoder: sharedCachingStringDecoder })? I prefer { stringCache: sharedStringCache } because its intent is clearer so users can focus on the cache engine, not the whole string decoder.


BTW I have another idea to share encoders and decoders, not directly related to this PR: #56

Copy link
Member

gfx commented Jul 26, 2019

I'd like to proceed this patch this week. FYI: Chrome 76 is going to be released at the end of this month, which includes JSON.parse performance improvement.

@gfx gfx force-pushed the cached_key_string_decoder branch from 838b277 to 8be4057 Compare August 2, 2019 02:26
@gfx gfx changed the title (削除) Experimental cache string decoder for map keys (削除ここまで) (追記) Cached string decoder for map keys (CachedKeyDecoder) (追記ここまで) Aug 2, 2019
if (records.length >= this.maxLengthPerKey) {
// `records` are full!
// Set `record` to a randomized position.
records[(Math.random() * records.length) | 0] = record;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have replaced the original hit-count based sorting algorithm by random-picking algorithm to reduce overhead in #get().

Copy link

codecov-io commented Aug 2, 2019
edited
Loading

Codecov Report

Merging #54 into master will increase coverage by 0.08%.
The diff coverage is 100%.

Impacted file tree graph

@@ Coverage Diff @@
## master #54 +/- ##
==========================================
+ Coverage 98.16% 98.25% +0.08% 
==========================================
 Files 15 16 +1 
 Lines 927 972 +45 
 Branches 189 197 +8 
==========================================
+ Hits 910 955 +45 
 Misses 17 17
Impacted Files Coverage Δ
src/CachedKeyDecoder.ts 100% <100%> (ø)
src/Decoder.ts 98.88% <100%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a0be621...11803d8. Read the comment docs.

Copy link
Member

gfx commented Aug 2, 2019
edited
Loading

Benchmark on NodeJS/v12.7.0

operation op ms op/s
buf = Buffer.from(JSON.stringify(obj)); 507700 5000 101540
buf = JSON.stringify(obj); 958100 5000 191620
obj = JSON.parse(buf); 346500 5000 69300
buf = require("msgpack-lite").encode(obj); 361800 5001 72345
obj = require("msgpack-lite").decode(buf); 267400 5000 53480
buf = require("@msgpack/msgpack").encode(obj); 510200 5000 102040
obj = require("@msgpack/msgpack").decode(buf); 825500 5000 165100

Copy link
Member

gfx commented Aug 2, 2019

Now, this PR is finished. cc: @sergeyzenchenko

@@ -59,6 +62,9 @@ export class Decoder {
headByte = HEAD_BYTE_REQUIRED;
readonly stack: Array<StackState> = [];

// TODO: parameterize this property.
readonly cachedKeyDecoder = sharedCachedKeyDecoder;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about the interface to customize a key decoder, so it's not yet public for now.

@gfx gfx merged commit 0a5966e into master Aug 2, 2019
@gfx gfx deleted the cached_key_string_decoder branch August 2, 2019 05:48
@gfx gfx mentioned this pull request Oct 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Reviewers

@gfx gfx gfx approved these changes

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /