-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
fix(pg): fix binary format buffer handling #3496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- change Parser.parseDataRowMessage() to produce a buffer slice instead of a string - change Result.parseRow() and _parseRowAsArray() to create the string from the buffer slice if not in binary mode
c72c089
to
e564bba
Compare
did binary parsing for int32 work prior to #3494 ? if so, i wonder if we special case the array parsing to fix the regression. that would give us time to explore a larger, more correct change.
change Parser.parseDataRowMessage() to produce a buffer slice instead of a string
i like the idea of passing around a buffer of bytes and only converting to string at the last possible moment.
No, I don't think it worked, but I never got to that point because the bind argument protocol message was wrong for binary mode.
I was just about to try to fix that after digging around with wireshark when the new version released 😅
Please feel free to make any necessary changes, I've never worked on the internals of node-postgres before but I have written similar binary message parsers in JS before..
I will dive into this more later tonight and this weekend. I also plan on adding more data type parsing integration tests as well.
I am not finding the performance regression.
Before:
parseDataRowMessage
converts the buffer into a string usingthis.reader.string(len)
which usesbuffer.toString
. This allocates.parseRow()
and_parseRowAsArray()
pass the string to the parser. This does not allocate
After:
parseDataRowMessage
creates a buffer slice usingthis.reader.bytes(len) which uses
buffer.slice``. This does NOT allocate. sourceparseRow()
and_parseRowAsArray()
converts the buffer into a string. This allocates
In both cases, we now have a single allocation. (Actually, my recent fix introduced another allocation when adding the additional Buffer.from(rawValue)
, but you already fixed that).
I did add tests for every type I could think of.
I think the allocation they’re referring to is the additional Buffer
view object itself on the string path.
I think the allocation they’re referring to is the additional
Buffer
view object itself on the string path.
Yes, I was referring to the creation of the additional Buffer view when parsing, but because there is no allocation/copy involved its probably not relevant.
I was also just wondering about performance in general - when comparing text vs. binary mode in my app I cannot see any improvements when fetching large amounts of rows
Ah, the new object from buffer.slice
that is now created in the string path.
but because there is no allocation/copy involved its probably not relevant.
I would think this as well! I wrote a quick benchmark to be sure
import { run, bench } from 'mitata' import { Client } from 'pg' const client = new Client() await client.connect() bench('text format', async () => { const rowCount = 50000 return client.query({ text: `SELECT generate_series(1, 1ドル) as id, (random() * 10000)::int as int_val, 'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.' as text_val, ARRAY[random(), random(), random()]::real[] as array_val`, values: [rowCount], binary: false, }) }) await run({ format: 'mitata', colors: true, throw: true, }) await client.end()
This was run on my local laptop, so not scientific. However, I am seeing this difference consistently:
Before
beforeAfter
afterIf this is true, then maybe we should look at hinting to parseDataRowMessage
the foramt and making DataRowMessage
generic over String and Buffer. I think that may be a fair amount of work to accomplish though.
I will keep looking as I get time.
I was also just wondering about performance in general - when comparing text vs. binary mode in my app I cannot see any improvements when fetching large amounts of rows
I am seeing some noticeable difference over large results. Same benchmark as above, but text vs binary
Text
textBinary
binaryAre you not seeing any difference? Can you share your benchmark?
Regarding your benchmark: in binary mode, the query result for the float array is only string garbage, it seems like type 1021/element type 700 is not supported by pg-types in binary mode, so that might skew the benchmark result
Also the noParser always creates a string from the Buffer in binary mode which I think is not a good default
Are you not seeing any difference? Can you share your benchmark?
In the case of my app (which i cannot share), I wasn't able to measure any meaningful difference.
The part I was measuring basically just dumps large-ish tables to an nd-json stream and looking at the profiler data pg row parsing is still the largest chunk, but <30% so hard to say whats going on
9a98412
to
5b56555
Compare
DataRowMessage.fields may now be either a string or a buffer slice depending on the field format. this relies on some assumption: - fieldDescriptions is accurate when parseDataRowmessage is called. we make a similar assumption in pg/lib/result.js, so this seems fine.
ce5d8fb avoids the additional Buffer view object on the string path. this avoids the slowdown i was seeing on my local benchmarks.
if this change looks good, i think we can land this PR and then continue a discussion about why binary format is not showing meaningful differences in a dedicated issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't see any other code in pg-protocol doing this, but i would prefer to assert(this.fieldDescriptions[i], 'Missing field description');
instead of defaulting to text format if a field description is not found.
Uh oh!
There was an error while loading. Please reload this page.
This tries to fix #3495
Functionally, this seems to fix the issue and the tests pass (edit: the stream tests are currently not working), but performance is not ideal because now in binary mode there is a buffer slice created, and then a string from the slice when a string is needed,, where previously the string was directly created from the buffer. Maybe it would probably be better instead to hint the field type to the parser before parsing?