0

Trying to implement an API extraction that dumps the results from JSON apis daily in a postgres db. The goal for these data is to be used in analysis. Afterwards I will handle those using dbt to do the modeling part (extracting the JSON values).

Example JSON values from an object representing a task on a task management system:

{
 gid: 1234, // Unique id for the resource
 updated_at: "2023年3月22日 08:53:46 GMT",
 ... rest of the attributes
}

My current strategy is storing those in tables with this schema

gid info::jsonb updated_at::timestamp
12345 {...} 2023年3月22日 08:53:46 GMT

If the extraction is to happen once a day every day, does the strategy of surrogate key of (gid, updated_at) sound ok to judge if the insert should happen or not ?

asked Mar 22, 2023 at 9:07

1 Answer 1

0

Yes, that seems reasonable.
Since you'll be scanning the jsonb on each pass each time for gid and updated_at, you'll want to create a GIN index for this purpose, e.g.

CREATE INDEX api_gid ON api USING GIN ((info-> 'gid'));

Ideally, you could create an index on both gid and updated_at, but that requires use of an extension called btree_gin. Without the extension, you could try creating a second index on info->'updated_at', but it would probably not get used.. depends on what else the query needs to do.

If you can install the btree_gin extension, then this syntax would work:

CREATE INDEX index_name ON items USING GIN (account_id, keywords);
answered Mar 22, 2023 at 20:15

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.