I have a JSONB field called data with a few keys and there is a field like:
{school_id: [nil, 123456]}
I'm filtering all records where second element is not null using query:
"data -> 'school_id' ->> 1 IS NOT NULL"
How can I apply an index on that field ? I've tried:
"CREATE INDEX data_school_id ON events ((data -> 'school_id'));"
but when I use EXPLAIN I do not see that this index is in use.
=> EXPLAIN for: SELECT "events".* FROM "events" WHERE (data -> 'school_id' ->> 1 IS NOT NULL)
QUERY PLAN
-------------------------------------------------------------------------------
Seq Scan on events (cost=0.00..211043.04 rows=2096733 width=555)
Filter: (((data -> 'school_id'::text) ->> 1) IS NOT NULL)
(2 rows)
2 Answers 2
The index is only used when Postgres expects it to help performance - which is only the case if the filter is expected to be selective enough (~ 5 % or less of the rows qualify, percentage heavily depends on various details).
One problem with document types like jsonb
: Postgres currently does not maintain statistics about value frequencies of embedded elements (still true in Postgres 10). So it has to base its decision whether or not to use an index on generic frequency estimations. Meaning, even if your particular filter data -> 'school_id' ->> 1 IS NOT NULL
is very selective and using the index would pay, Postgres works with a generic average estimation and might miss the opportunity.
There are ways around this with expression or partial indexes, because Postgres collects separate statistics for index expressions. The best you could do for this particular query would be a partial index:
CREATE INDEX data_school_id ON events (event_id) -- idx column largely irrelevant here
WHERE (data -> 'school_id' ->> 1) IS NOT NULL;
And VACUUM ANALYZE events;
at least once (or wait until autovacuum kicks in).
Related:
I'v tried to add partial index as described about, but still only filter info in EXPLAIN. I'm sure filtered fields are much less than 5% of the overall data, much less than 1%.
Seq Scan on event_store_events (cost=0.00..210489.39 rows=2059676 width=520)
Filter: (((data -> 'school_id'::text) ->> 1) IS NOT NULL)
(2 rows)
Postgres version is 10.
Table definition:
Table "public.events"
Column | Type | Collation | Nullable | Default
------------+-----------------------------+-----------+----------+---------------------------------
id | integer | | not null | nextval('esedev_seq'::regclass)
stream | character varying | | not null |
event_type | character varying | | not null |
event_id | character varying | | not null |
metadata | jsonb | | | '{}'::jsonb
data | jsonb | | not null | '{}'::jsonb
created_at | timestamp without time zone | | not null |
Indexes:
"events_pkey" PRIMARY KEY, btree (id)
"ese_created_at_cast_date_index" btree ((created_at::date))
"ese_data_school_class_id" btree ((data ->> 'school_class_id'::text))
"ese_data_school_id_second_not_null" btree (event_id) WHERE ((data -> 'school_id'::text) ->> 1) IS NOT NULL
"ese_data_status_index" btree ((data ->> 'status'::text))
"ese_data_student_id" btree ((data ->> 'student_id'::text))
"ese_event_type_index" btree (event_type)
"ese_metadata_customer_id_cast_integer_index" btree (((metadata ->> 'customer_id'::text)::integer))
"ese_metadata_premium_index" btree ((metadata ->> 'premium'::text))
"ese_metadata_signed_in_at_index" btree ((metadata ->> 'signed_in_at'::text))
"ese_metadata_user_id_index" btree ((metadata ->> 'user_id'::text))
"ese_metadata_user_type_index" btree ((metadata ->> 'user_type'::text))
"ese_stream_index" btree (stream)
-
Did you run
VACUUM ANALYZE
after creating the index?Erwin Brandstetter– Erwin Brandstetter2017年12月22日 03:43:27 +00:00Commented Dec 22, 2017 at 3:43
Explore related questions
See similar questions with these tags.
random_page_cost
,cpu_index_tuple_cost
andeffective_cache_size
.EXPLAIN (ANALYZE, BUFFERS)
, not justEXPLAIN
. Consider instructions in the info to the [postgresql-performance] tag.