I have been trying to figure out network bandwidth of a Query output, since PostgreSQL only holds I/O bandwidth of Query I ended up in Octect_length() or pg_column_size(). I found below query from another thread
select sum(octet_length(temp.*::text)) from (Select * from test) as temp;
Is this query returns what I am expecting? Is there any other way to find size of a query output which is transferred over the network neglecting protocol overhead?
-
1You don't account for the protocol overhead, which, depending on the row contents, can be comparable with the row width.mustaccio– mustaccio2024年03月04日 13:29:14 +00:00Commented Mar 4, 2024 at 13:29
-
As of postgres 16, the size of resultsets (either including or not including the protocol overhead) are not part of metrics that are exposed in pg_stat_* views (as far as I know). For accurate results, count the bytes at the network level with a protocol analyzer. Or look for server-side extensionsDaniel Vérité– Daniel Vérité2024年03月04日 23:25:30 +00:00Commented Mar 4, 2024 at 23:25
-
@mustaccio Thanks for the comment. I only interested in bytes retrieved by query but no TCP overhead. If I have this query 'select * from test' how do i find it's row width?goodfella– goodfella2024年03月05日 01:29:03 +00:00Commented Mar 5, 2024 at 1:29
-
Sorry, this doesn't make a lot of sense. The network bandwidth, which seems to be the topic of your question, is consumed by the TCP packets, and not accounting for everything they contain is not going to give you an accurate measure of the bandwidth consumption.mustaccio– mustaccio2024年03月05日 03:03:54 +00:00Commented Mar 5, 2024 at 3:03
1 Answer 1
This is easy enough to test - at least for the data! As @mustaccio points out, there will be further overhead for the network protocol that is used to transmit the data from the server to your client machine. The contents of this article could perhaps be adapted to see how many bytes in total are received per query.
For the data, first, we check the database encoding (aka character sets - all the code below is available on the fiddle here):
SELECT
d.datname AS "Name",
pg_catalog.pg_get_userbyid(d.datdba) AS "Owner",
pg_catalog.pg_encoding_to_char(d.encoding) AS "Encoding",
d.datcollate AS "Collate",
d.datctype AS "Ctype",
pg_catalog.array_to_string(d.datacl, E'\n') AS "Access privileges"
FROM pg_catalog.pg_database d
ORDER BY 1;
Result:
Name Owner Encoding Collate Ctype Access privileges
postgres postgres UTF8 C.UTF-8 C.UTF-8 null
template0 postgres UTF8 C.UTF-8 C.UTF-8 =c/postgres c/postgres=CTc/ postgres
template1 postgres UTF8 C.UTF-8 C.UTF-8 =c/postgres postgres=CTc/ postgres
So, all databases created on the fiddle will, by default, have a UTF8 encoding, including the ones we create for our fiddle.
So, now we create a table and populate it:
CREATE TABLE test
(
a INT NOT NULL,
b TEXT NOT NULL
);
Records:
>INSERT INTO test VALUES
(1, 'afasdfasdf'),
(2, 'afasdfasdafasdfasf'),
(3, 'afasdfasdfasdfasdfsdf'),
(4, 'af'),
(5, 'afasdfa');
and then we run the following command (note that all of the characters are ASCII so take up one byte):
SELECT
SUM(PG_COLUMN_SIZE(t.a)) AS int_sz,
SUM(CHARACTER_LENGTH(t.b)) AS char_len,
SUM(OCTET_LENGTH(t.b)) AS oct_len,
SUM(OCTET_LENGTH(t.*::TEXT)) AS oct_len_total
FROM
test t; -- you don't even need the alias, since the column
-- names are unambiguous - there's only one table!
-- However, with no alias, you do need to specify test.*!
Result:
int_sz char_len oct_len oct_len_total
20 58 58 78
So, we see that with ASCII, the total of oct_len_total
is equal to the SUM()
of int_sz
and either char_len
or oct_len
- i.e. 20 + 58 = 78.
Now, we insert non-ASCII characters:
INSERT INTO test VALUES
(6, '我爱你');
and rerun our size query - result of rerun:
int_sz char_len oct_len oct_len_total
24 61 67 91
Now, we see that the oct_len_total
is not equal to the SUM()
of int_sz
and char_len
(= 85), but instead it's equal to the SUM()
of int_sz
and oct_len
(24 + 67 = 91).
We have 3 extra characters, but 6 extra bytes - i.e. 2 extra per character (i.e. 3 bytes/character), which is to be expected from Chinese writing.
So, the answer to your question is, yes, your approach of using OCTET_LENGTH()
is the correct one!
-
Hi Verace, you explanation seems sensible to me but I don't why a -ve vote. We will wait for some expertise :)goodfella– goodfella2024年03月05日 01:32:17 +00:00Commented Mar 5, 2024 at 1:32
-
Thanks - I don't know why I got a -ve vote either - I think it may be something to do with the fact that I didn't cover the network protocol overhead - but then I specifically said
at least for the data...
- but I've a post here which was hopefully going to correct that - but all I've got for my trouble so far is a downvote and no explanation! :-( I think I'll wander over to the wireshark site and ask the question there - might find people who actually know something!Vérace– Vérace2024年03月05日 03:05:58 +00:00Commented Mar 5, 2024 at 3:05 -
@BasilTitus - I believe that I've answered the question as asked, but as you can see from the other post, there's quite a bit of overhead with TCP/IP. I'm fairly sure that this will come down (proportionately) if you
SELECT
ed, say, 10,000 rows, the overhead mightn't be that much bigger or indeed bigger at all? RunningSELECT test.* FROM GENERATE_SERIES(1, 10), test;
appears to suggest that this is the case! 10 times as much data, but only 1766/298 (5.9) times as much txn inFrame 2
(see post on superuser in previous comment). The other frames were more or less the same size!Vérace– Vérace2024年03月05日 03:18:38 +00:00Commented Mar 5, 2024 at 3:18