2

I have a table that is +80 gigs with 200 million rows. I'm trying to speed it up and I noticed that it has a large number of varchar columns. In the schema, their lengths range from 15 chars to 250 chars with most set at 50. The total length of all the varchars is 850 bytes. In actual usage, a lot of the fields are null or the strings are really short.

I know that Postgres uses an 8k page size. Now if I have to do a full table scan and assuming worst case scenario, 8k / 850 = 9.6 records per page. Going through my full table is going to (and does) take a long time. In actuality however, since most of those fields are empty, how much space will be allocated on disk for those varchars? Will there be more records per page or does Postgres put in a little padding just to make things easy for updates later?

The reason I'm asking is because I'm exploring the idea of improving performance by kicking as many of the infrequently accessed varchars columns as possible out of this table and into another one that we would access via a join.

Any confirmation or denial of my logic is appreciated.

mj

asked Feb 4, 2020 at 19:32
1
  • 1
    I think you'll find the answer in Index performance for CHAR vs VARCHAR (Postgres). However, I believe you have an X-Y problem. What kind of use case asks for regular full table scans on a 200M row table? Commented Feb 4, 2020 at 19:45

1 Answer 1

5

Your considerations are sound, but all of this is already taken care of automatically by PostgreSQL:

  • A NULL value takes up no space at all in a PostgreSQL table row.

  • A varchar will only occupy as much space as the value actually has bytes. The length limit (type modifier) does not waste any space.

  • For rows that are really wide (2000 bytes and above), the varchars are first compressed and then, if the result is still too large, stored externally in a TOAST table. If you SELECT from such a row, the toasted values are not read unless you specify the column values are requested.

    But if the size limit for a row is 850 bytes, that is too small to get in that range.

You can check the estimated average row size in bytes by looking at the width column in the output of EXPLAIN SELECT * FROM tablename.

PostgreSQL v12 introduced the toast_tuple_target storage parameter which would enable you to reduce the limit for TOASTing data. If you tune that, and after rewriting the table with VACUUM (FULL), PostgreSQL can store the data in the way you want to. Then you can see if that actually gets you better performance.

answered Feb 4, 2020 at 19:55
1
  • 1
    I did the explain select * and got a length of 216 back. That's significantly better than what I was expecting. Commented Feb 4, 2020 at 21:38

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.