5

Here is the table definition (simplified):

CREATE TABLE documents (
 document_id int4 NOT NULL GENERATED BY DEFAULT AS IDENTITY,
 data_block jsonb NULL
);

Sample values:

INSERT INTO documents (document_id, data_block)
VALUES
 (878979, 
 '{"COMMONS": {"DATE": {"value": "2017-03-11"}},
 "PAYABLE_INVOICE_LINES": [
 {"AMOUNT": {"value": 52408.53}}, 
 {"AMOUNT": {"value": 654.23}}
 ]}')
 , (977656, 
 '{"COMMONS": {"DATE": {"value": "2018-03-11"}},
 "PAYABLE_INVOICE_LINES": [
 {"AMOUNT": {"value": 555.10}}
 ]}');

I want to search all documents where one of the 'PAYABLE_INVOICE_LINES' elements contains a 'value' greater than 1000.00.

My query is

select *
from documents d
cross join lateral jsonb_array_elements(d.data_block -> 'PAYABLE_INVOICE_LINES') as pil 
where (pil->'AMOUNT'->>'value')::decimal > 1000

But, as I want to limit to 50 documents, I have to group on the document_id and limit the result to 50.

With millions of documents, this query is very expensive - 10 seconds with 1 million.

I try to add a GIN index on the array of the jsonb object. But it seems it's applied only while using a jsonb operator like @>.

Do you have some ideas to have better performance?

asked Mar 31, 2018 at 13:36
0

1 Answer 1

12

This is generally hard to optimize: no direct operator or index support for jsonb for this kind of test.

EXISTS should at least be faster than what you have, while also avoiding duplicate rows (where multiple array elements match) and the additional (redundant) column pil in the result:

SELECT *
FROM documents d
WHERE EXISTS (
 SELECT FROM jsonb_array_elements(d.data_block -> 'PAYABLE_INVOICE_LINES') pil 
 WHERE (pil->'AMOUNT'->>'value')::decimal > 1000
 );

Related:

To make this faster by orders of magnitude, extract the maximum value per row and save it redundantly or use an IMMUTABLE function in a very small and fast (but also specialized) expression index:

CREATE OR REPLACE FUNCTION f_doc_max_amout(jsonb)
 RETURNS numeric AS
$func$
 SELECT max((a->'AMOUNT'->>'value')::numeric)
 FROM jsonb_array_elements(1ドル) a
$func$ LANGUAGE sql IMMUTABLE;
CREATE INDEX documents_max_amount_idx
ON documents (f_doc_max_amout(data_block -> 'PAYABLE_INVOICE_LINES')); 

Query (must match index expression):

SELECT *
FROM documents d
WHERE f_doc_max_amout(data_block -> 'PAYABLE_INVOICE_LINES') > 1000;

dbfiddle here

answered Apr 1, 2018 at 2:21
5
  • 2
    If an additional extension can be installed on the Postgres server, there is a special one called "jsquery" which extends regular JSON functionality, with GIN indexes support: github.com/postgrespro/jsquery. Packages for Ubuntu are available, as well as RPMs. Commented Apr 1, 2018 at 4:50
  • 1
    Yes, a very powerful extension from Postgres core contributors. It provides the missing functionality and index support for numeric comparison, among other things. Ultimate goal is to integrate it with core Postgres, Currently, it is not even in the extensions distributed with Postgres, yet. Packages for Debian, Ubuntu and probably other distributions are available. Or you have to compile and install yourself. Impossible on most cloud services. Tedious on other installs. Commented Apr 1, 2018 at 13:18
  • Thanks ! This solution seems to be a good option for me. I will check for the extension too. Really thanks ! Commented Apr 1, 2018 at 14:40
  • @Ryu: The EXISTS query, the expression index or both? Commented Apr 1, 2018 at 14:51
  • :) sorry, the solution with function with index. I think it is the best solution. Commented Apr 1, 2018 at 15:03

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.