Query results not as expected - Jsonb Postgres

Question 1

We've created the following table:

CREATE TABLE public.influencers
(
 id integer NOT NULL DEFAULT nextval('influencers_id_seq'::regclass),
 location jsonb,
 gender text COLLATE pg_catalog."default",
 birthdate timestamp without time zone,
 ig jsonb,
 contact_info jsonb,
 created_at timestamp without time zone DEFAULT now(),
 updated_at timestamp without time zone DEFAULT now(),
 categories text[] COLLATE pg_catalog."default",
 search_field text COLLATE pg_catalog."default",
 search_vector tsvector,
 ig_updated_at timestamp without time zone,
 CONSTRAINT influencers_pkey PRIMARY KEY (id),
 CONSTRAINT ig_id_must_exist CHECK (ig ? 'id'::text),
 CONSTRAINT ig_username_must_exist CHECK (ig ? 'username'::text)
)

And created the following index in order to filter a field inside ig that contains integers. This is the index we created:

CREATE INDEX idx_btree_ig_follower_count
 ON public.influencers
 USING BTREE ((ig->>'follower_count'));

The following query should return 427000 results. Instead, it returns a random number that nothing has to do with what it should return. This is the query:

SELECT count(*)
FROM "influencers"
WHERE ((ig ->> 'follower_count') >= '1000')
 AND ((ig ->> 'follower_count') <= '10000')
 AND (ig->>'follower_count') IS NOT NULL

We suspect that it is comparing strings (or text) instead of integers and that's why it's returning incomprehensible results.

However, the query that works (return expected rows) but doesn't use the idx_btree_ig_follower_count index is the following:

SELECT count(*)
FROM "influencers"
WHERE ((ig -> 'follower_count') >= '1000')
 AND ((ig -> 'follower_count') <= '10000')
 AND (ig->'follower_count') IS NOT NULL;

How can we create the query or the index in order to being able to query efficiently?

Question 2

Have you tried using: WHERE ((ig ->> 'follower_count')::int >= 1000) ...

Question 3

Try this.. Just out of curiosity..

CREATE INDEX idx_btree_ig_follower_count
 ON public.influencers
 USING BTREE (CAST(ig->>'follower_count' AS int));

Then in your query, cast to int.

SELECT count(*)
FROM "influencers"
WHERE ((ig ->> 'follower_count')::int >= '1000')
 AND ((ig ->> 'follower_count')::int <= '10000');

You don't need to specify IS NOT NULL if you specify >=, and <=

Test case

I created some test data with 10 million rows of jsonb, with btree index over it.

CREATE TABLE influences
AS
 SELECT
 jsonb_build_object('follower_count', trunc(random()*1e7)::int) AS ig 
 FROM generate_series(1,1e7);
CREATE INDEX ON influences( CAST( (ig->>'follower_count') AS int) );

I'm getting the same query plan you provided,. You can see mine here.

EXPLAIN (ANALYZE,BUFFERS)
SELECT count(*) FROM influences
WHERE (ig->>'follower_count')::int BETWEEN 0 AND 854780;
 QUERY PLAN 
----------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate (cost=130124.36..130124.37 rows=1 width=0) (actual time=482.615..482.615 rows=1 loops=1)
 Buffers: shared hit=85674
 -> Bitmap Heap Scan on influences (cost=18483.87..127946.94 rows=870969 width=0) (actual time=206.182..426.880 rows=855570 loops=1)
 Recheck Cond: ((((ig ->> 'follower_count'::text))::integer >= 0) AND (((ig ->> 'follower_count'::text))::integer <= 854780))
 Heap Blocks: exact=83334
 Buffers: shared hit=85674
 -> Bitmap Index Scan on influences_int4_idx (cost=0.00..18266.13 rows=870969 width=0) (actual time=182.158..182.158 rows=855570 loops=1)
 Index Cond: ((((ig ->> 'follower_count'::text))::integer >= 0) AND (((ig ->> 'follower_count'::text))::integer <= 854780))
 Buffers: shared hit=2340
 Planning time: 0.341 ms
 Execution time: 482.665 ms
(11 rows)

I'm aggregating far more rows, 855570 in my example, and it's over 10x faster.

I clustered it

CLUSTER influences USING influences_int4_idx ;

And, that halfed the time further to 200ms

EXPLAIN (ANALYZE,BUFFERS) SELECT count(*) FROM influences WHERE (ig->>'follower_count')::int BETWEEN 0 AND 854780;
 QUERY PLAN 
--------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate (cost=130124.66..130124.67 rows=1 width=0) (actual time=211.880..211.881 rows=1 loops=1)
 Buffers: shared hit=4 read=9466
 -> Bitmap Heap Scan on influences (cost=18483.94..127947.22 rows=870976 width=0) (actual time=69.503..155.670 rows=855570 loops=1)
 Recheck Cond: ((((ig ->> 'follower_count'::text))::integer >= 0) AND (((ig ->> 'follower_count'::text))::integer <= 854780))
 Heap Blocks: exact=7130
 Buffers: shared hit=4 read=9466
 -> Bitmap Index Scan on influences_int4_idx (cost=0.00..18266.20 rows=870976 width=0) (actual time=68.150..68.150 rows=855570 loops=1)
 Index Cond: ((((ig ->> 'follower_count'::text))::integer >= 0) AND (((ig ->> 'follower_count'::text))::integer <= 854780))
 Buffers: shared hit=3 read=2337
 Planning time: 0.412 ms
 Execution time: 211.929 ms
(11 rows)

Question 4

It gives an error when trying to create the index: ERROR: syntax error at or near "::" LINE 3: USING BTREE ((ig->>'follower_count')::int);

Question 5

@borjagvo try again with the update.

Question 6

This is the result: gist.github.com/bgvo/4d7b75de6d5d9e8d39fbc170990ea974 It took 3 secs.

Question 7

Can you paste the query plan with formatting and also the query, @borjagvo

Question 8

Sorry, what do you mean with formatting? I just took out spaces at the beginning for each line of the explain. Here it is what PgAdmin returns when running explain: gist.github.com/bgvo/df4d4e2dd8a26c4f3dcdbda9869e2de4 Anything else that you need?

Evan Carroll Evan Carroll 65.7k50 gold badges259 silver badges511 bronze badges · Answer 1 · 2017-05-17 15:49:41Z

Try this.. Just out of curiosity..

CREATE INDEX idx_btree_ig_follower_count
 ON public.influencers
 USING BTREE (CAST(ig->>'follower_count' AS int));

Then in your query, cast to int.

SELECT count(*)
FROM "influencers"
WHERE ((ig ->> 'follower_count')::int >= '1000')
 AND ((ig ->> 'follower_count')::int <= '10000');

You don't need to specify IS NOT NULL if you specify >=, and <=

Test case

I created some test data with 10 million rows of jsonb, with btree index over it.

CREATE TABLE influences
AS
 SELECT
 jsonb_build_object('follower_count', trunc(random()*1e7)::int) AS ig 
 FROM generate_series(1,1e7);
CREATE INDEX ON influences( CAST( (ig->>'follower_count') AS int) );

I'm getting the same query plan you provided,. You can see mine here.

EXPLAIN (ANALYZE,BUFFERS)
SELECT count(*) FROM influences
WHERE (ig->>'follower_count')::int BETWEEN 0 AND 854780;
 QUERY PLAN 
----------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate (cost=130124.36..130124.37 rows=1 width=0) (actual time=482.615..482.615 rows=1 loops=1)
 Buffers: shared hit=85674
 -> Bitmap Heap Scan on influences (cost=18483.87..127946.94 rows=870969 width=0) (actual time=206.182..426.880 rows=855570 loops=1)
 Recheck Cond: ((((ig ->> 'follower_count'::text))::integer >= 0) AND (((ig ->> 'follower_count'::text))::integer <= 854780))
 Heap Blocks: exact=83334
 Buffers: shared hit=85674
 -> Bitmap Index Scan on influences_int4_idx (cost=0.00..18266.13 rows=870969 width=0) (actual time=182.158..182.158 rows=855570 loops=1)
 Index Cond: ((((ig ->> 'follower_count'::text))::integer >= 0) AND (((ig ->> 'follower_count'::text))::integer <= 854780))
 Buffers: shared hit=2340
 Planning time: 0.341 ms
 Execution time: 482.665 ms
(11 rows)

I'm aggregating far more rows, 855570 in my example, and it's over 10x faster.

I clustered it

CLUSTER influences USING influences_int4_idx ;

And, that halfed the time further to 200ms

EXPLAIN (ANALYZE,BUFFERS) SELECT count(*) FROM influences WHERE (ig->>'follower_count')::int BETWEEN 0 AND 854780;
 QUERY PLAN 
--------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate (cost=130124.66..130124.67 rows=1 width=0) (actual time=211.880..211.881 rows=1 loops=1)
 Buffers: shared hit=4 read=9466
 -> Bitmap Heap Scan on influences (cost=18483.94..127947.22 rows=870976 width=0) (actual time=69.503..155.670 rows=855570 loops=1)
 Recheck Cond: ((((ig ->> 'follower_count'::text))::integer >= 0) AND (((ig ->> 'follower_count'::text))::integer <= 854780))
 Heap Blocks: exact=7130
 Buffers: shared hit=4 read=9466
 -> Bitmap Index Scan on influences_int4_idx (cost=0.00..18266.20 rows=870976 width=0) (actual time=68.150..68.150 rows=855570 loops=1)
 Index Cond: ((((ig ->> 'follower_count'::text))::integer >= 0) AND (((ig ->> 'follower_count'::text))::integer <= 854780))
 Buffers: shared hit=3 read=2337
 Planning time: 0.412 ms
 Execution time: 211.929 ms
(11 rows)

It gives an error when trying to create the index: ERROR: syntax error at or near "::" LINE 3: USING BTREE ((ig->>'follower_count')::int);
This is the result: gist.github.com/bgvo/4d7b75de6d5d9e8d39fbc170990ea974 It took 3 secs.
Can you paste the query plan with formatting and also the query, @borjagvo
Sorry, what do you mean with formatting? I just took out spaces at the beginning for each line of the explain. Here it is what PgAdmin returns when running explain: gist.github.com/bgvo/df4d4e2dd8a26c4f3dcdbda9869e2de4 Anything else that you need?

Stack Exchange Network

Query results not as expected - Jsonb Postgres

1 Answer 1

Test case

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Query results not as expected - Jsonb Postgres

1 Answer 1

Test case

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions