fix(pg): fix binary format buffer handling #3496

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

faulpeltz wants to merge 3 commits into brianc:master

from faulpeltz:fix-3495

Open

fix(pg): fix binary format buffer handling #3496

faulpeltz wants to merge 3 commits into brianc:master from faulpeltz:fix-3495

Conversation

faulpeltz

Copy link

@faulpeltz faulpeltz commented Jun 20, 2025 •

edited

Loading

This tries to fix #3495

Functionally, this seems to fix the issue and the tests pass (edit: the stream tests are currently not working), but performance is not ideal because now in binary mode there is a buffer slice created, and then a string from the slice when a string is needed,, where previously the string was directly created from the buffer. Maybe it would probably be better instead to hint the field type to the parser before parsing?

change Parser.parseDataRowMessage() to produce a buffer slice instead of a string
change Result.parseRow() and _parseRowAsArray() to create the string from the buffer slice if not in binary mode

@faulpeltz


 fix(pg): fix binary format buffer handling

e564bba

- change Parser.parseDataRowMessage() to produce a buffer slice instead of a string
- change Result.parseRow() and _parseRowAsArray() to create the string from the buffer slice if not in binary mode

@faulpeltz faulpeltz force-pushed the fix-3495 branch from c72c089 to e564bba Compare

June 20, 2025 14:51

@hjr3

Copy link

Collaborator

hjr3 commented Jun 20, 2025

did binary parsing for int32 work prior to #3494 ? if so, i wonder if we special case the array parsing to fix the regression. that would give us time to explore a larger, more correct change.

change Parser.parseDataRowMessage() to produce a buffer slice instead of a string

i like the idea of passing around a buffer of bytes and only converting to string at the last possible moment.

@faulpeltz

Copy link

Author

faulpeltz commented Jun 20, 2025

No, I don't think it worked, but I never got to that point because the bind argument protocol message was wrong for binary mode.
I was just about to try to fix that after digging around with wireshark when the new version released 😅

Please feel free to make any necessary changes, I've never worked on the internals of node-postgres before but I have written similar binary message parsers in JS before..

@hjr3

Copy link

Collaborator

hjr3 commented Jun 20, 2025

I will dive into this more later tonight and this weekend. I also plan on adding more data type parsing integration tests as well.

@hjr3 hjr3 force-pushed the fix-3495 branch from b28fab8 to ccc8619 Compare

June 21, 2025 03:26

@hjr3

Copy link

Collaborator

hjr3 commented Jun 21, 2025 •

edited

Loading

I am not finding the performance regression.

Before:

parseDataRowMessage converts the buffer into a string using this.reader.string(len) which uses buffer.toString. This allocates.
parseRow() and _parseRowAsArray() pass the string to the parser. This does not allocate

After:

parseDataRowMessage creates a buffer slice using this.reader.bytes(len) which uses buffer.slice``. This does NOT allocate. source
parseRow() and _parseRowAsArray() converts the buffer into a string. This allocates

In both cases, we now have a single allocation. (Actually, my recent fix introduced another allocation when adding the additional Buffer.from(rawValue), but you already fixed that).

I did add tests for every type I could think of.

@charmander

Copy link

Collaborator

charmander commented Jun 21, 2025

I think the allocation they’re referring to is the additional Buffer view object itself on the string path.

@faulpeltz

Copy link

Author

faulpeltz commented Jun 21, 2025

I think the allocation they’re referring to is the additional Buffer view object itself on the string path.

Yes, I was referring to the creation of the additional Buffer view when parsing, but because there is no allocation/copy involved its probably not relevant.

I was also just wondering about performance in general - when comparing text vs. binary mode in my app I cannot see any improvements when fetching large amounts of rows

@hjr3


 add exhaustive tests for all types

e1201c1

@hjr3

Copy link

Collaborator

hjr3 commented Jun 21, 2025

Ah, the new object from buffer.slice that is now created in the string path.

but because there is no allocation/copy involved its probably not relevant.

I would think this as well! I wrote a quick benchmark to be sure

import { run, bench } from 'mitata'
import { Client } from 'pg'
const client = new Client()
await client.connect()
bench('text format', async () => {
 const rowCount = 50000
 return client.query({
 text: `SELECT
 generate_series(1, 1ドル) as id,
 (random() * 10000)::int as int_val,
 'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.' as text_val,
 ARRAY[random(), random(), random()]::real[] as array_val`,
 values: [rowCount],
 binary: false,
 })
})
await run({
 format: 'mitata',
 colors: true,
 throw: true,
})
await client.end()

This was run on my local laptop, so not scientific. However, I am seeing this difference consistently:

Before

before

After

after

If this is true, then maybe we should look at hinting to parseDataRowMessage the foramt and making DataRowMessage generic over String and Buffer. I think that may be a fair amount of work to accomplish though.

I will keep looking as I get time.

@hjr3

Copy link

Collaborator

hjr3 commented Jun 21, 2025

I was also just wondering about performance in general - when comparing text vs. binary mode in my app I cannot see any improvements when fetching large amounts of rows

I am seeing some noticeable difference over large results. Same benchmark as above, but text vs binary

Text

text

Binary

binary

Are you not seeing any difference? Can you share your benchmark?

@faulpeltz

Copy link

Author

faulpeltz commented Jun 21, 2025 •

edited

Loading

Regarding your benchmark: in binary mode, the query result for the float array is only string garbage, it seems like type 1021/element type 700 is not supported by pg-types in binary mode, so that might skew the benchmark result
Also the noParser always creates a string from the Buffer in binary mode which I think is not a good default

Are you not seeing any difference? Can you share your benchmark?

In the case of my app (which i cannot share), I wasn't able to measure any meaningful difference.
The part I was measuring basically just dumps large-ish tables to an nd-json stream and looking at the profiler data pg row parsing is still the largest chunk, but <30% so hard to say whats going on

@hjr3 hjr3 force-pushed the fix-3495 branch 2 times, most recently from 9a98412 to 5b56555 Compare

June 28, 2025 17:43

@hjr3


 avoid additional buffer slice on string path

ce5d8fb

DataRowMessage.fields may now be either a string or a buffer slice
depending on the field format.
this relies on some assumption:
- fieldDescriptions is accurate when parseDataRowmessage is called. we
 make a similar assumption in pg/lib/result.js, so this seems fine.

@hjr3 hjr3 force-pushed the fix-3495 branch from 5b56555 to ce5d8fb Compare

June 28, 2025 17:47

@hjr3

Copy link

Collaborator

hjr3 commented Jun 28, 2025 •

edited

Loading

ce5d8fb avoids the additional Buffer view object on the string path. this avoids the slowdown i was seeing on my local benchmarks.

if this change looks good, i think we can land this PR and then continue a discussion about why binary format is not showing meaningful differences in a dedicated issue

hjr3

hjr3 reviewed

Jun 28, 2025

View reviewed changes

packages/pg-protocol/src/parser.ts

if (len === -1) {

fields[i] = null

} else {

const format = this.fieldDescriptions[i]?.format

Copy link

Collaborator

@hjr3 hjr3 Jun 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't see any other code in pg-protocol doing this, but i would prefer to assert(this.fieldDescriptions[i], 'Missing field description'); instead of defaulting to text format if a field description is not found.

Labels

None yet

3 participants

@faulpeltz @hjr3 @charmander

Uh oh!

fix(pg): fix binary format buffer handling #3496

Are you sure you want to change the base?

fix(pg): fix binary format buffer handling #3496

Uh oh!

Conversation

@faulpeltz faulpeltz commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hjr3 commented Jun 20, 2025

Uh oh!

faulpeltz commented Jun 20, 2025

Uh oh!

hjr3 commented Jun 20, 2025

Uh oh!

hjr3 commented Jun 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charmander commented Jun 21, 2025

Uh oh!

faulpeltz commented Jun 21, 2025

Uh oh!

hjr3 commented Jun 21, 2025

Uh oh!

hjr3 commented Jun 21, 2025

Uh oh!

faulpeltz commented Jun 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hjr3 commented Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

@hjr3 hjr3 Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

@faulpeltz faulpeltz commented Jun 20, 2025 •

edited

Loading

hjr3 commented Jun 21, 2025 •

edited

Loading

faulpeltz commented Jun 21, 2025 •

edited

Loading

hjr3 commented Jun 28, 2025 •

edited

Loading