I am processing data tables with varying numbers of columns and rows, represented by JSON documents with one array per column. The format of a document is
{
"column_1": ["value_1_1", "value_1_2", ..., "value_1_n"],
"column_2": ["value_2_1", "value_2_2", ..., "value_2_n"],
...,
"column_m": ["value_m_1", "value_m_2", ..., "value_m_n"],
}
The number of columns m
is typically in the lower tens, while the number of values n
lies in the lower millions. Values are either small integers or short strings, and individual data tables stored as text files are 100-200 MB in size.
Documents are stored in a jsonb
column in PostgreSQL:
Table "data"
************
Column | Type | Nullable
--------------+---------+----------
dat_id | integer | not null
dat_document | jsonb | not null
Documents are typically served "as is" to an application or with a simple filter, e.g.,
select col2, col7, col9
from (
select jsonb_array_elements(dat_document->'column_1') as col1,
jsonb_array_elements(dat_document->'column_2') as col2,
jsonb_array_elements(dat_document->'column_5') as col7,
jsonb_array_elements(dat_document->'column_9') as col9,
from data
where dat_id = 20
) as sub
where sub.col1::text like '%@yahoo.com';
The documents are stored as jsonb
following recommendations from the PostgreSQL documentation, however, when using the simpler json
type instead in the table data, I do not notice a significant drop in execution time for simple queries as the one above. On the other hand, my documents seem to take roughly 50% more disk space according to pg_column_size
on the same document as jsonb
vs json
.
Is there any advantage of storing my documents as jsonb
instead of json
in this case?
1 Answer 1
jsonb
should make the query faster, as extracting the attributes will be more efficient. But it will make INSERT
and UPDATE
a bit slower, since PostgreSQL has to construct the internal binary representation of the JSON.
I doubt that the effect will be strong. Run a benchmark to be sure.
But I think that neither json
nor jsonb
are appropriate. Anything except storing and retrieving individual documents will be painful, because
unless you search by primary key as your query indicates, you won't be able to use an index to speed up a pattern matching query and each query will have to read all data
any
UPDATE
of a single attribute will require you to read the entire JSON, disassemble it, replace the value, reassemble it into a JSON and write a new version of the entire rowyou won't be able to enforce referential integrity, and joins will be slow
See my rant about how to employ JSON correctly for more details.
Your sample looks like you should use a database table to store such a value, as it consists of columns and rows.
-
Re your second point about
UPDATE
, updating json documents is a bit more cumbersome than directly updating table columns, but it's not that painful in my experience (compared to points 1 and 3).jsonb_set
and||
are of great help. Where it becomes painful is when updating (elements of) arrays in json.Bergi– Bergi2025年05月16日 22:53:54 +00:00Commented May 16 at 22:53 -
Database designers have lost the json war, just as old-timers lost the email quoting war.RonJohn– RonJohn2025年05月17日 02:13:46 +00:00Commented May 17 at 2:13
-
@RonJohn Where can I learn more about that war?J. Mini– J. Mini2025年05月17日 13:05:09 +00:00Commented May 17 at 13:05
-
1I see no war. I see people shooting themselves in the foot and other people trying to tell them they shouldn't. Different from bad quoting in e-mails, where you inflict pain on the reader, you are only hurting yourself with a bad design.Laurenz Albe– Laurenz Albe2025年05月18日 12:52:16 +00:00Commented May 18 at 12:52
-
1Ok, then if you don't need your query to run efficiently (without scanning the whole table), your data model may work for your use case.Laurenz Albe– Laurenz Albe2025年05月19日 08:42:42 +00:00Commented May 19 at 8:42
jsonb
are 50% bigger than if usingjson
?pg_column_size
reports on average 50% more bytes for the same document stored in ajsonb
column compared to when it is stored in ajson
column, which leads me to believe that my documents are more efficiently compressed as text objects than as "better jsons".