I have two database tables (A and B), each containing roughly 1.6m rows, and each having the following structure:
CREATE TABLE tablename (
identifier text NOT NULL,
geometry geometry(GeometryZ,27700) NOT NULL,
properties jsonb NOT NULL
);
On both of these tables, I have created the following indexes:
CREATE INDEX tablename_identifier ON tablename USING btree (identifier);
CREATE INDEX tablename_geometry ON tablename USING gist (geometry);
CREATE INDEX tablename_properties ON tablename USING gin (properties jsonb_path_ops);
As I understand it, Postgres should not be able to create statistics for JSONB columns and instead falls back to hard-coded selectivity estimates for use in query planning. However, if I look in pg_stats
after ANALYZE
ing both tables, one of them has values in the histogram_bounds
column for the properties
JSONB attribute:
SELECT tablename, attname, avg_width, histogram_bounds, correlation
FROM pg_stats
WHERE attname = 'properties'
tablename | attname | avg_width | histogram_bounds | correlation |
---|---|---|---|---|
table_a | properties | 887 | {"{...}","{...}"} [1] | 0.0015469971 |
table_b | properties | 829 | null | null |
[1] this array contains 101 elements and contains textual representations of the contents of the JSONB column
I had considered that the contents of the properties
column in table_a
were identical, but this is not the case. They are just as distinct between rows as those in table_b
.
I believe this issue is causing some inconsistencies in the query plans generated for the two tables. Why is Postgres creating these statistics on a JSONB column when it shouldn't be able to?
1 Answer 1
JSONB, unlike JSON, has ordering operators defined on it. That is all that's needed to create histograms. So JSONB does have them, regardless of how useful they may be. So my question is, why does table_b not have them? I think this has been the case since JSONB was created.
Also v13 changed the way selectivity of match operators was estimated. Now it uses the histogram values as if it were a random sample, to estimate the rate of matches.
-
Interesting! Yes the selectivity estimates are different between the two tables. I wonder if that is related to the histogram differences. For table_a with a histogram, selectivity appears to be 0.01% of the row count; for table_b without a histogram it appears to be 1.0% of the row count.JoeTea– JoeTea2021年10月21日 13:37:28 +00:00Commented Oct 21, 2021 at 13:37
Explore related questions
See similar questions with these tags.