Trying to implement an API extraction that dumps the results from JSON apis daily in a postgres db. The goal for these data is to be used in analysis. Afterwards I will handle those using dbt to do the modeling part (extracting the JSON values).
Example JSON values from an object representing a task on a task management system:
{
gid: 1234, // Unique id for the resource
updated_at: "2023年3月22日 08:53:46 GMT",
... rest of the attributes
}
My current strategy is storing those in tables with this schema
gid | info::jsonb | updated_at::timestamp |
---|---|---|
12345 | {...} | 2023年3月22日 08:53:46 GMT |
If the extraction is to happen once a day every day, does the strategy of surrogate key of (gid, updated_at) sound ok to judge if the insert should happen or not ?
1 Answer 1
Yes, that seems reasonable.
Since you'll be scanning the jsonb on each pass each time for gid and updated_at, you'll want to create a GIN index for this purpose, e.g.
CREATE INDEX api_gid ON api USING GIN ((info-> 'gid'));
Ideally, you could create an index on both gid and updated_at, but that requires use of an extension called btree_gin
. Without the extension, you could try creating a second index on info->'updated_at'
, but it would probably not get used.. depends on what else the query needs to do.
If you can install the btree_gin
extension, then this syntax would work:
CREATE INDEX index_name ON items USING GIN (account_id, keywords);