JSONB vs JSON for column-oriented data in PostgreSQL

Question 1

I am processing data tables with varying numbers of columns and rows, represented by JSON documents with one array per column. The format of a document is

{
 "column_1": ["value_1_1", "value_1_2", ..., "value_1_n"],
 "column_2": ["value_2_1", "value_2_2", ..., "value_2_n"],
 ...,
 "column_m": ["value_m_1", "value_m_2", ..., "value_m_n"],
}

The number of columns m is typically in the lower tens, while the number of values n lies in the lower millions. Values are either small integers or short strings, and individual data tables stored as text files are 100-200 MB in size.

Documents are stored in a jsonb column in PostgreSQL:

 Table "data"
 ************
 Column | Type | Nullable 
--------------+---------+----------
 dat_id | integer | not null
 dat_document | jsonb | not null

Documents are typically served "as is" to an application or with a simple filter, e.g.,

select col2, col7, col9
from (
 select jsonb_array_elements(dat_document->'column_1') as col1,
 jsonb_array_elements(dat_document->'column_2') as col2,
 jsonb_array_elements(dat_document->'column_5') as col7,
 jsonb_array_elements(dat_document->'column_9') as col9,
 from data
 where dat_id = 20
 ) as sub
where sub.col1::text like '%@yahoo.com';

The documents are stored as jsonb following recommendations from the PostgreSQL documentation, however, when using the simpler json type instead in the table data, I do not notice a significant drop in execution time for simple queries as the one above. On the other hand, my documents seem to take roughly 50% more disk space according to pg_column_size on the same document as jsonb vs json.

Is there any advantage of storing my documents as jsonb instead of json in this case?

Question 2

Have you considered changing your schema to store each column as a separate postgres row? Then you would be able to use a btree index to select the columns.

Question 3

Why don't you index the column (will be a large one) and use the jsonb-functions to search? Casting to text and using a text operator without an index will be terribly slow by design.

Question 4

Wording ambiguous. Are you saying that tables using jsonb are 50% bigger than if using json?

Question 5

@RonJohn what I am saying is that pg_column_size reports on average 50% more bytes for the same document stored in a jsonb column compared to when it is stored in a json column, which leads me to believe that my documents are more efficiently compressed as text objects than as "better jsons".

Question 6

jsonb should make the query faster, as extracting the attributes will be more efficient. But it will make INSERT and UPDATE a bit slower, since PostgreSQL has to construct the internal binary representation of the JSON.

I doubt that the effect will be strong. Run a benchmark to be sure.

But I think that neither json nor jsonb are appropriate. Anything except storing and retrieving individual documents will be painful, because

unless you search by primary key as your query indicates, you won't be able to use an index to speed up a pattern matching query and each query will have to read all data
any UPDATE of a single attribute will require you to read the entire JSON, disassemble it, replace the value, reassemble it into a JSON and write a new version of the entire row
you won't be able to enforce referential integrity, and joins will be slow

See my rant about how to employ JSON correctly for more details.

Your sample looks like you should use a database table to store such a value, as it consists of columns and rows.

Question 7

Re your second point about UPDATE, updating json documents is a bit more cumbersome than directly updating table columns, but it's not that painful in my experience (compared to points 1 and 3). jsonb_set and || are of great help. Where it becomes painful is when updating (elements of) arrays in json.

Question 8

Database designers have lost the json war, just as old-timers lost the email quoting war.

Question 9

@RonJohn Where can I learn more about that war?

Question 10

I see no war. I see people shooting themselves in the foot and other people trying to tell them they shouldn't. Different from bad quoting in e-mails, where you inflict pain on the reader, you are only hurting yourself with a bad design.

Question 11

Ok, then if you don't need your query to run efficiently (without scanning the whole table), your data model may work for your use case.

Laurenz Albe Laurenz Albe 62.1k4 gold badges57 silver badges93 bronze badges · Answer 1 · 2025-05-16 19:56:42Z

jsonb should make the query faster, as extracting the attributes will be more efficient. But it will make INSERT and UPDATE a bit slower, since PostgreSQL has to construct the internal binary representation of the JSON.

I doubt that the effect will be strong. Run a benchmark to be sure.

But I think that neither json nor jsonb are appropriate. Anything except storing and retrieving individual documents will be painful, because

unless you search by primary key as your query indicates, you won't be able to use an index to speed up a pattern matching query and each query will have to read all data
any UPDATE of a single attribute will require you to read the entire JSON, disassemble it, replace the value, reassemble it into a JSON and write a new version of the entire row
you won't be able to enforce referential integrity, and joins will be slow

See my rant about how to employ JSON correctly for more details.

Your sample looks like you should use a database table to store such a value, as it consists of columns and rows.

Re your second point about UPDATE, updating json documents is a bit more cumbersome than directly updating table columns, but it's not that painful in my experience (compared to points 1 and 3). jsonb_set and || are of great help. Where it becomes painful is when updating (elements of) arrays in json.
Database designers have lost the json war, just as old-timers lost the email quoting war.
I see no war. I see people shooting themselves in the foot and other people trying to tell them they shouldn't. Different from bad quoting in e-mails, where you inflict pain on the reader, you are only hurting yourself with a bad design.
Ok, then if you don't need your query to run efficiently (without scanning the whole table), your data model may work for your use case.

Stack Exchange Network

JSONB vs JSON for column-oriented data in PostgreSQL

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

JSONB vs JSON for column-oriented data in PostgreSQL

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions