Search for nested values in jsonb array with greater operator

Question 1

Here is the table definition (simplified):

CREATE TABLE documents (
 document_id int4 NOT NULL GENERATED BY DEFAULT AS IDENTITY,
 data_block jsonb NULL
);

Sample values:

INSERT INTO documents (document_id, data_block)
VALUES
 (878979, 
 '{"COMMONS": {"DATE": {"value": "2017-03-11"}},
 "PAYABLE_INVOICE_LINES": [
 {"AMOUNT": {"value": 52408.53}}, 
 {"AMOUNT": {"value": 654.23}}
 ]}')
 , (977656, 
 '{"COMMONS": {"DATE": {"value": "2018-03-11"}},
 "PAYABLE_INVOICE_LINES": [
 {"AMOUNT": {"value": 555.10}}
 ]}');

I want to search all documents where one of the 'PAYABLE_INVOICE_LINES' elements contains a 'value' greater than 1000.00.

My query is

select *
from documents d
cross join lateral jsonb_array_elements(d.data_block -> 'PAYABLE_INVOICE_LINES') as pil 
where (pil->'AMOUNT'->>'value')::decimal > 1000

But, as I want to limit to 50 documents, I have to group on the document_id and limit the result to 50.

With millions of documents, this query is very expensive - 10 seconds with 1 million.

I try to add a GIN index on the array of the jsonb object. But it seems it's applied only while using a jsonb operator like @>.

Do you have some ideas to have better performance?

Question 2

This is generally hard to optimize: no direct operator or index support for jsonb for this kind of test.

EXISTS should at least be faster than what you have, while also avoiding duplicate rows (where multiple array elements match) and the additional (redundant) column pil in the result:

SELECT *
FROM documents d
WHERE EXISTS (
 SELECT FROM jsonb_array_elements(d.data_block -> 'PAYABLE_INVOICE_LINES') pil 
 WHERE (pil->'AMOUNT'->>'value')::decimal > 1000
 );

To make this faster by orders of magnitude, extract the maximum value per row and save it redundantly or use an IMMUTABLE function in a very small and fast (but also specialized) expression index:

CREATE OR REPLACE FUNCTION f_doc_max_amout(jsonb)
 RETURNS numeric AS
$func$
 SELECT max((a->'AMOUNT'->>'value')::numeric)
 FROM jsonb_array_elements(1ドル) a
$func$ LANGUAGE sql IMMUTABLE;
CREATE INDEX documents_max_amount_idx
ON documents (f_doc_max_amout(data_block -> 'PAYABLE_INVOICE_LINES'));

Query (must match index expression):

SELECT *
FROM documents d
WHERE f_doc_max_amout(data_block -> 'PAYABLE_INVOICE_LINES') > 1000;

dbfiddle here

Question 3

If an additional extension can be installed on the Postgres server, there is a special one called "jsquery" which extends regular JSON functionality, with GIN indexes support: github.com/postgrespro/jsquery. Packages for Ubuntu are available, as well as RPMs.

Question 4

Yes, a very powerful extension from Postgres core contributors. It provides the missing functionality and index support for numeric comparison, among other things. Ultimate goal is to integrate it with core Postgres, Currently, it is not even in the extensions distributed with Postgres, yet. Packages for Debian, Ubuntu and probably other distributions are available. Or you have to compile and install yourself. Impossible on most cloud services. Tedious on other installs.

Question 5

Thanks ! This solution seems to be a good option for me. I will check for the extension too. Really thanks !

Question 6

@Ryu: The EXISTS query, the expression index or both?

Question 7

:) sorry, the solution with function with index. I think it is the best solution.

score 12 · Accepted Answer · 2018-04-01 02:21:46Z

This is generally hard to optimize: no direct operator or index support for jsonb for this kind of test.

EXISTS should at least be faster than what you have, while also avoiding duplicate rows (where multiple array elements match) and the additional (redundant) column pil in the result:

SELECT *
FROM documents d
WHERE EXISTS (
 SELECT FROM jsonb_array_elements(d.data_block -> 'PAYABLE_INVOICE_LINES') pil 
 WHERE (pil->'AMOUNT'->>'value')::decimal > 1000
 );

To make this faster by orders of magnitude, extract the maximum value per row and save it redundantly or use an IMMUTABLE function in a very small and fast (but also specialized) expression index:

CREATE OR REPLACE FUNCTION f_doc_max_amout(jsonb)
 RETURNS numeric AS
$func$
 SELECT max((a->'AMOUNT'->>'value')::numeric)
 FROM jsonb_array_elements(1ドル) a
$func$ LANGUAGE sql IMMUTABLE;
CREATE INDEX documents_max_amount_idx
ON documents (f_doc_max_amout(data_block -> 'PAYABLE_INVOICE_LINES'));

Query (must match index expression):

SELECT *
FROM documents d
WHERE f_doc_max_amout(data_block -> 'PAYABLE_INVOICE_LINES') > 1000;

dbfiddle here

If an additional extension can be installed on the Postgres server, there is a special one called "jsquery" which extends regular JSON functionality, with GIN indexes support: github.com/postgrespro/jsquery. Packages for Ubuntu are available, as well as RPMs.
Yes, a very powerful extension from Postgres core contributors. It provides the missing functionality and index support for numeric comparison, among other things. Ultimate goal is to integrate it with core Postgres, Currently, it is not even in the extensions distributed with Postgres, yet. Packages for Debian, Ubuntu and probably other distributions are available. Or you have to compile and install yourself. Impossible on most cloud services. Tedious on other installs.
Thanks ! This solution seems to be a good option for me. I will check for the extension too. Really thanks !
:) sorry, the solution with function with index. I think it is the best solution.

Stack Exchange Network

Search for nested values in jsonb array with greater operator

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Search for nested values in jsonb array with greater operator

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions