0

I have been trying to figure out network bandwidth of a Query output, since PostgreSQL only holds I/O bandwidth of Query I ended up in Octect_length() or pg_column_size(). I found below query from another thread

select sum(octet_length(temp.*::text)) from (Select * from test) as temp;

Is this query returns what I am expecting? Is there any other way to find size of a query output which is transferred over the network neglecting protocol overhead?

asked Mar 4, 2024 at 7:20
4
  • 1
    You don't account for the protocol overhead, which, depending on the row contents, can be comparable with the row width. Commented Mar 4, 2024 at 13:29
  • As of postgres 16, the size of resultsets (either including or not including the protocol overhead) are not part of metrics that are exposed in pg_stat_* views (as far as I know). For accurate results, count the bytes at the network level with a protocol analyzer. Or look for server-side extensions Commented Mar 4, 2024 at 23:25
  • @mustaccio Thanks for the comment. I only interested in bytes retrieved by query but no TCP overhead. If I have this query 'select * from test' how do i find it's row width? Commented Mar 5, 2024 at 1:29
  • Sorry, this doesn't make a lot of sense. The network bandwidth, which seems to be the topic of your question, is consumed by the TCP packets, and not accounting for everything they contain is not going to give you an accurate measure of the bandwidth consumption. Commented Mar 5, 2024 at 3:03

1 Answer 1

-1

This is easy enough to test - at least for the data! As @mustaccio points out, there will be further overhead for the network protocol that is used to transmit the data from the server to your client machine. The contents of this article could perhaps be adapted to see how many bytes in total are received per query.

For the data, first, we check the database encoding (aka character sets - all the code below is available on the fiddle here):

SELECT 
 d.datname AS "Name",
 pg_catalog.pg_get_userbyid(d.datdba) AS "Owner",
 pg_catalog.pg_encoding_to_char(d.encoding) AS "Encoding",
 d.datcollate AS "Collate",
 d.datctype AS "Ctype",
 pg_catalog.array_to_string(d.datacl, E'\n') AS "Access privileges"
FROM pg_catalog.pg_database d
ORDER BY 1;

Result:

Name Owner Encoding Collate Ctype Access privileges
postgres postgres UTF8 C.UTF-8 C.UTF-8 null
template0 postgres UTF8 C.UTF-8 C.UTF-8 =c/postgres c/postgres=CTc/ postgres
template1 postgres UTF8 C.UTF-8 C.UTF-8 =c/postgres postgres=CTc/ postgres

So, all databases created on the fiddle will, by default, have a UTF8 encoding, including the ones we create for our fiddle.

So, now we create a table and populate it:

CREATE TABLE test
(
 a INT NOT NULL,
 b TEXT NOT NULL
);

Records:

>INSERT INTO test VALUES
(1, 'afasdfasdf'),
(2, 'afasdfasdafasdfasf'),
(3, 'afasdfasdfasdfasdfsdf'),
(4, 'af'),
(5, 'afasdfa');

and then we run the following command (note that all of the characters are ASCII so take up one byte):

SELECT
 SUM(PG_COLUMN_SIZE(t.a)) AS int_sz,
 SUM(CHARACTER_LENGTH(t.b)) AS char_len,
 SUM(OCTET_LENGTH(t.b)) AS oct_len,
 SUM(OCTET_LENGTH(t.*::TEXT)) AS oct_len_total
FROM
 test t; -- you don't even need the alias, since the column
 -- names are unambiguous - there's only one table!
 -- However, with no alias, you do need to specify test.*!

Result:

int_sz char_len oct_len oct_len_total
 20 58 58 78

So, we see that with ASCII, the total of oct_len_total is equal to the SUM() of int_sz and either char_len or oct_len - i.e. 20 + 58 = 78.

Now, we insert non-ASCII characters:

INSERT INTO test VALUES
(6, '我爱你');

and rerun our size query - result of rerun:

int_sz char_len oct_len oct_len_total
 24 61 67 91

Now, we see that the oct_len_total is not equal to the SUM() of int_sz and char_len (= 85), but instead it's equal to the SUM() of int_sz and oct_len (24 + 67 = 91).

We have 3 extra characters, but 6 extra bytes - i.e. 2 extra per character (i.e. 3 bytes/character), which is to be expected from Chinese writing.

So, the answer to your question is, yes, your approach of using OCTET_LENGTH() is the correct one!

answered Mar 4, 2024 at 13:11
3
  • Hi Verace, you explanation seems sensible to me but I don't why a -ve vote. We will wait for some expertise :) Commented Mar 5, 2024 at 1:32
  • Thanks - I don't know why I got a -ve vote either - I think it may be something to do with the fact that I didn't cover the network protocol overhead - but then I specifically said at least for the data... - but I've a post here which was hopefully going to correct that - but all I've got for my trouble so far is a downvote and no explanation! :-( I think I'll wander over to the wireshark site and ask the question there - might find people who actually know something! Commented Mar 5, 2024 at 3:05
  • @BasilTitus - I believe that I've answered the question as asked, but as you can see from the other post, there's quite a bit of overhead with TCP/IP. I'm fairly sure that this will come down (proportionately) if you SELECTed, say, 10,000 rows, the overhead mightn't be that much bigger or indeed bigger at all? Running SELECT test.* FROM GENERATE_SERIES(1, 10), test; appears to suggest that this is the case! 10 times as much data, but only 1766/298 (5.9) times as much txn in Frame 2 (see post on superuser in previous comment). The other frames were more or less the same size! Commented Mar 5, 2024 at 3:18

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.