Postgres creating 'histogram_bounds' statistics on JSONB column

Question 1

I have two database tables (A and B), each containing roughly 1.6m rows, and each having the following structure:

CREATE TABLE tablename (
 identifier text NOT NULL,
 geometry geometry(GeometryZ,27700) NOT NULL,
 properties jsonb NOT NULL
);

On both of these tables, I have created the following indexes:

CREATE INDEX tablename_identifier ON tablename USING btree (identifier);
CREATE INDEX tablename_geometry ON tablename USING gist (geometry);
CREATE INDEX tablename_properties ON tablename USING gin (properties jsonb_path_ops);

As I understand it, Postgres should not be able to create statistics for JSONB columns and instead falls back to hard-coded selectivity estimates for use in query planning. However, if I look in pg_stats after ANALYZEing both tables, one of them has values in the histogram_bounds column for the properties JSONB attribute:

SELECT tablename, attname, avg_width, histogram_bounds, correlation
FROM pg_stats
WHERE attname = 'properties'

tablename	attname	avg_width	histogram_bounds	correlation
table_a	properties	887	{"{...}","{...}"} [1]	0.0015469971
table_b	properties	829	null	null

^{[1] this array contains 101 elements and contains textual representations of the contents of the JSONB column}

I had considered that the contents of the properties column in table_a were identical, but this is not the case. They are just as distinct between rows as those in table_b.

I believe this issue is causing some inconsistencies in the query plans generated for the two tables. Why is Postgres creating these statistics on a JSONB column when it shouldn't be able to?

Question 2

JSONB, unlike JSON, has ordering operators defined on it. That is all that's needed to create histograms. So JSONB does have them, regardless of how useful they may be. So my question is, why does table_b not have them? I think this has been the case since JSONB was created.

Also v13 changed the way selectivity of match operators was estimated. Now it uses the histogram values as if it were a random sample, to estimate the rate of matches.

Question 3

Interesting! Yes the selectivity estimates are different between the two tables. I wonder if that is related to the histogram differences. For table_a with a histogram, selectivity appears to be 0.01% of the row count; for table_b without a histogram it appears to be 1.0% of the row count.

jjanes jjanes 42.5k3 gold badges44 silver badges54 bronze badges · Answer 1 · 2021-10-21 12:02:57Z

JSONB, unlike JSON, has ordering operators defined on it. That is all that's needed to create histograms. So JSONB does have them, regardless of how useful they may be. So my question is, why does table_b not have them? I think this has been the case since JSONB was created.

Also v13 changed the way selectivity of match operators was estimated. Now it uses the histogram values as if it were a random sample, to estimate the rate of matches.

Interesting! Yes the selectivity estimates are different between the two tables. I wonder if that is related to the histogram differences. For table_a with a histogram, selectivity appears to be 0.01% of the row count; for table_b without a histogram it appears to be 1.0% of the row count.

Stack Exchange Network

Postgres creating 'histogram_bounds' statistics on JSONB column

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Postgres creating 'histogram_bounds' statistics on JSONB column

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions