Postgresql index for jsonb array field

Question 1

I have a JSONB field called data with a few keys and there is a field like:

{school_id: [nil, 123456]}

I'm filtering all records where second element is not null using query:

"data -> 'school_id' ->> 1 IS NOT NULL"

How can I apply an index on that field ? I've tried:

"CREATE INDEX data_school_id ON events ((data -> 'school_id'));"

but when I use EXPLAIN I do not see that this index is in use.

=> EXPLAIN for: SELECT "events".* FROM "events" WHERE (data -> 'school_id' ->> 1 IS NOT NULL)
 QUERY PLAN
-------------------------------------------------------------------------------
 Seq Scan on events (cost=0.00..211043.04 rows=2096733 width=555)
 Filter: (((data -> 'school_id'::text) ->> 1) IS NOT NULL)
(2 rows)

Question 2

Essential missing info: Postgres version, table definition, frequency of NULL / NOT NULL in the expression. Also relevant: cost settings incl. random_page_cost, cpu_index_tuple_cost and effective_cache_size.

Question 3

You should provide information in your question, not an answer. Use edit. Also add the total number of rows and the number of rows returned by your query, please. And the output of EXPLAIN (ANALYZE, BUFFERS), not just EXPLAIN. Consider instructions in the info to the [postgresql-performance] tag.

Question 4

The index is only used when Postgres expects it to help performance - which is only the case if the filter is expected to be selective enough (~ 5 % or less of the rows qualify, percentage heavily depends on various details).

One problem with document types like jsonb: Postgres currently does not maintain statistics about value frequencies of embedded elements (still true in Postgres 10). So it has to base its decision whether or not to use an index on generic frequency estimations. Meaning, even if your particular filter data -> 'school_id' ->> 1 IS NOT NULL is very selective and using the index would pay, Postgres works with a generic average estimation and might miss the opportunity.

There are ways around this with expression or partial indexes, because Postgres collects separate statistics for index expressions. The best you could do for this particular query would be a partial index:

CREATE INDEX data_school_id ON events (event_id) -- idx column largely irrelevant here
WHERE (data -> 'school_id' ->> 1) IS NOT NULL;

And VACUUM ANALYZE events; at least once (or wait until autovacuum kicks in).

Question 5

I'v tried to add partial index as described about, but still only filter info in EXPLAIN. I'm sure filtered fields are much less than 5% of the overall data, much less than 1%.

 Seq Scan on event_store_events (cost=0.00..210489.39 rows=2059676 width=520)
 Filter: (((data -> 'school_id'::text) ->> 1) IS NOT NULL)
(2 rows)

Postgres version is 10.

Table definition:

 Table "public.events"
 Column | Type | Collation | Nullable | Default
------------+-----------------------------+-----------+----------+---------------------------------
 id | integer | | not null | nextval('esedev_seq'::regclass)
 stream | character varying | | not null |
 event_type | character varying | | not null |
 event_id | character varying | | not null |
 metadata | jsonb | | | '{}'::jsonb
 data | jsonb | | not null | '{}'::jsonb
 created_at | timestamp without time zone | | not null |
Indexes:
 "events_pkey" PRIMARY KEY, btree (id)
 "ese_created_at_cast_date_index" btree ((created_at::date))
 "ese_data_school_class_id" btree ((data ->> 'school_class_id'::text))
 "ese_data_school_id_second_not_null" btree (event_id) WHERE ((data -> 'school_id'::text) ->> 1) IS NOT NULL
 "ese_data_status_index" btree ((data ->> 'status'::text))
 "ese_data_student_id" btree ((data ->> 'student_id'::text))
 "ese_event_type_index" btree (event_type)
 "ese_metadata_customer_id_cast_integer_index" btree (((metadata ->> 'customer_id'::text)::integer))
 "ese_metadata_premium_index" btree ((metadata ->> 'premium'::text))
 "ese_metadata_signed_in_at_index" btree ((metadata ->> 'signed_in_at'::text))
 "ese_metadata_user_id_index" btree ((metadata ->> 'user_id'::text))
 "ese_metadata_user_type_index" btree ((metadata ->> 'user_type'::text))
 "ese_stream_index" btree (stream)

Question 6

Did you run VACUUM ANALYZE after creating the index?

score 4 · Answer 1 · 2017-12-20 14:07:03Z

The index is only used when Postgres expects it to help performance - which is only the case if the filter is expected to be selective enough (~ 5 % or less of the rows qualify, percentage heavily depends on various details).

One problem with document types like jsonb: Postgres currently does not maintain statistics about value frequencies of embedded elements (still true in Postgres 10). So it has to base its decision whether or not to use an index on generic frequency estimations. Meaning, even if your particular filter data -> 'school_id' ->> 1 IS NOT NULL is very selective and using the index would pay, Postgres works with a generic average estimation and might miss the opportunity.

There are ways around this with expression or partial indexes, because Postgres collects separate statistics for index expressions. The best you could do for this particular query would be a partial index:

CREATE INDEX data_school_id ON events (event_id) -- idx column largely irrelevant here
WHERE (data -> 'school_id' ->> 1) IS NOT NULL;

And VACUUM ANALYZE events; at least once (or wait until autovacuum kicks in).

Artur79 Artur79 1111 silver badge3 bronze badges · Answer 2 · 2017-12-21 15:38:12Z

I'v tried to add partial index as described about, but still only filter info in EXPLAIN. I'm sure filtered fields are much less than 5% of the overall data, much less than 1%.

 Seq Scan on event_store_events (cost=0.00..210489.39 rows=2059676 width=520)
 Filter: (((data -> 'school_id'::text) ->> 1) IS NOT NULL)
(2 rows)

Postgres version is 10.

Table definition:

 Table "public.events"
 Column | Type | Collation | Nullable | Default
------------+-----------------------------+-----------+----------+---------------------------------
 id | integer | | not null | nextval('esedev_seq'::regclass)
 stream | character varying | | not null |
 event_type | character varying | | not null |
 event_id | character varying | | not null |
 metadata | jsonb | | | '{}'::jsonb
 data | jsonb | | not null | '{}'::jsonb
 created_at | timestamp without time zone | | not null |
Indexes:
 "events_pkey" PRIMARY KEY, btree (id)
 "ese_created_at_cast_date_index" btree ((created_at::date))
 "ese_data_school_class_id" btree ((data ->> 'school_class_id'::text))
 "ese_data_school_id_second_not_null" btree (event_id) WHERE ((data -> 'school_id'::text) ->> 1) IS NOT NULL
 "ese_data_status_index" btree ((data ->> 'status'::text))
 "ese_data_student_id" btree ((data ->> 'student_id'::text))
 "ese_event_type_index" btree (event_type)
 "ese_metadata_customer_id_cast_integer_index" btree (((metadata ->> 'customer_id'::text)::integer))
 "ese_metadata_premium_index" btree ((metadata ->> 'premium'::text))
 "ese_metadata_signed_in_at_index" btree ((metadata ->> 'signed_in_at'::text))
 "ese_metadata_user_id_index" btree ((metadata ->> 'user_id'::text))
 "ese_metadata_user_type_index" btree ((metadata ->> 'user_type'::text))
 "ese_stream_index" btree (stream)

Did you run VACUUM ANALYZE after creating the index?

Erwin Brandstetter
– Erwin Brandstetter

2017年12月22日 03:43:27 +00:00
Commented Dec 22, 2017 at 3:43

Stack Exchange Network

Postgresql index for jsonb array field

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Postgresql index for jsonb array field

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions