I have a huge table in Postgresql-11 like following:
CREATE TABLE my_huge_table(
tick_time timestamp(6) with time zone NOT NULL,
brok_time timestamp(6) with time zone,
trade_day date NOT NULL,
--other fields ...
...
CONSTRAINT my_huge_table_pkey PRIMARY KEY (tick_time)
);
CREATE INDEX idx_my_huge_table_td_time ON my_huge_table USING brin
( trade_day, abs(tick_time - brok_time) );
Then I make a query and want it to take advantage of the index idx_my_huge_table_td_time
, like this:
SELECT * FROM my_huge_table
WHERE trade_day BETWEEN TO_DATE('20220104', 'YYYYMMDD') AND TO_DATE('20220104', 'YYYYMMDD')
AND ABS(tick_time - brok_time) < INTERVAL '10 s';
But PostgreSQL refuse to execute it, and said:
ERROR: function abs(interval) does not exist
LINE 3: AND ABS(tick_time - brok_time) < INTERVAL '10 s'
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
SQL state: 42883 Character: 525
It looks like that the func abs()
can NOT accept a interval value as a argument.
Then, I changed my query:
SELECT * FROM my_huge_table
WHERE trade_day BETWEEN TO_DATE('20220104', 'YYYYMMDD') AND TO_DATE('20220104', 'YYYYMMDD')
AND GREATEST(tick_time - brok_time, brok_time - tick_time) < INTERVAL '10 s';
This time it can be executed, but didn't take advantage of the index.
My questions:
1.How should I compose the expression of index? In fact I want it to record a distance(absolute interval value) between two timestamp fields;
2.How should I code the query that can use the index above?
3.In fact GREATEST(tick_time - brok_time, brok_time - tick_time)
is NOT a good idea, since it invoked two times computing. Isn't it?
4.After created the index, I note that the real DDL SQL of the index reported by PostgreSQL is:
CREATE INDEX idx_my_huge_table_td_time ON public.my_huge_table USING brin
(trade_day, abs(date_part('epoch'::text, tick_time - brok_time)));
Have the value of the expresstion casted into a text
type? It apparently is NOT my expectation!
1 Answer 1
The answer is to create a generated column as follows (all of the code below is available on the fiddle here):
I had an original answer (shown at end of answer), but I've revised it to use a Generated Column
(aka "Computed" or "Virtual" column) instead of an Expression Index
(aka "Functional Index").
This has the advantages of:
a) It's calculated on insertion and does not have to be recomputed every time and
b) it makes the SQL much clearer - see original answer below.
There's one disadvantage in that it uses more space, but I've found that this is not normally a critical issue (never seen it myself). Unfortunately, PostgreSQL does not yet have virtual generated columns - see link.
Your table definition should be as follows:
CREATE TABLE t
(
ticktime TIMESTAMPTZ,
broktime TIMESTAMPTZ,
trade_day DATE,
--
-- other fields
--
abs_b_minus_t INTERVAL GENERATED ALWAYS AS (GREATEST(broktime, ticktime) - LEAST(broktime, ticktime)) STORED
);
Then create an index on abs_b_minus_t
:
CREATE INDEX t_ix ON t
USING BRIN (trade_day, abs_b_minus_t );
Populate:
INSERT INTO t VALUES
('2022-02-14 14:43:55'::TIMESTAMPTZ, '2022-02-14 12:43:55'::TIMESTAMPTZ, '2022-02-14'::DATE),
('2022-03-14 14:43:55'::TIMESTAMPTZ, '2022-02-14 12:43:55'::TIMESTAMPTZ, '2022-03-14'::DATE),
('2022-02-14 14:43:55'::TIMESTAMPTZ, '2022-05-14 12:43:55'::TIMESTAMPTZ, '2022-02-14'::DATE);
Then we run:
SELECT
ticktime - broktime AS t_minus_b,
abs_b_minus_t
FROM t;
Result:
t_minus_b abs_b_minus_t
02:00:00 02:00:00
28 days 02:00:00 28 days 02:00:00
-88 days -21:00:00 88 days 21:00:00
So, we see that it's working - we are obtaining absolute values of the difference between broktime
and tradtime
.
Now, we can check index usage - we run SET enable_seqscan = OFF;
and then:
EXPLAIN (ANALYZE, VERBOSE, BUFFERS)
SELECT
broktime - ticktime
FROM t
WHERE abs_b_minus_t < INTERVAL '30 DAYS';
Result:
QUERY PLAN
Bitmap Heap Scan on public.t (cost=12.14..39.07 rows=423 width=16) (actual time=0.022..0.025 rows=2 loops=1)
Output: (broktime - ticktime)
Recheck Cond: (t.abs_b_minus_t < '30 days'::interval)
Rows Removed by Index Recheck: 1
Heap Blocks: lossy=1
Buffers: shared hit=3
-> Bitmap Index Scan on t_ix (cost=0.00..12.03 rows=1270 width=0) (actual time=0.017..0.017 rows=10 loops=1)
Index Cond: (t.abs_b_minus_t < '30 days'::interval)
Buffers: shared hit=2
Planning:
Buffers: shared hit=1
Planning Time: 0.042 ms
Execution Time: 0.052 ms
So, we are using t_ix
with the BRIN index on our generated field.
Original Answer:
CREATE TABLE t
(
ticktime TIMESTAMPTZ,
broktime TIMESTAMPTZ,
trade_day DATE
--
-- other fields
--
);
Now, we create our functional index as follows:
CREATE INDEX t_ix ON t
USING BRIN (trade_day, (GREATEST(broktime, ticktime) - LEAST(broktime, ticktime)));
Populate the table:
INSERT INTO t VALUES
('2022-02-14 14:43:55'::TIMESTAMPTZ, '2022-02-14 12:43:55'::TIMESTAMPTZ, '2022-02-14'::DATE),
('2022-03-14 14:43:55'::TIMESTAMPTZ, '2022-02-14 12:43:55'::TIMESTAMPTZ, '2022-03-14'::DATE),
('2022-02-14 14:43:55'::TIMESTAMPTZ, '2022-05-14 12:43:55'::TIMESTAMPTZ, '2022-02-14'::DATE);
Now we test:
SELECT
ticktime - broktime AS t_minus_b,
GREATEST(broktime, ticktime) - LEAST(broktime, ticktime) AS abs_b_minus_t
FROM t;
Result:
t_minus_b abs_b_minus_t
02:00:00 02:00:00
28 days 02:00:00 28 days 02:00:00
-88 days -21:00:00 88 days 21:00:00
So, we have the values and their absolutes.
SELECT
broktime - ticktime
FROM t
WHERE GREATEST(broktime, ticktime) - LEAST(broktime, ticktime) < INTERVAL '30 DAYS';
Result:
?column?
-02:00:00
-28 days -02:00:00
To check index usage, we disable seqscans:
Then, we run:
EXPLAIN (ANALYZE, VERBOSE, BUFFERS)
SELECT
broktime - ticktime
FROM t
WHERE GREATEST(broktime, ticktime) - LEAST(broktime, ticktime) < INTERVAL '30 DAYS';
Result:
QUERY PLAN
Bitmap Heap Scan on public.t (cost=12.17..57.59 rows=567 width=16) (actual time=0.041..0.044 rows=2 loops=1)
Output: (broktime - ticktime)
Recheck Cond: ((GREATEST(t.broktime, t.ticktime) - LEAST(t.broktime, t.ticktime)) < '30 days'::interval)
Rows Removed by Index Recheck: 1
Heap Blocks: lossy=1
Buffers: shared hit=3
-> Bitmap Index Scan on t_ix (cost=0.00..12.03 rows=1700 width=0) (actual time=0.027..0.027 rows=10 loops=1)
Index Cond: ((GREATEST(t.broktime, t.ticktime) - LEAST(t.broktime, t.ticktime)) < '30 days'::interval)
Buffers: shared hit=2
Planning:
Buffers: shared hit=1
Planning Time: 0.044 ms
Execution Time: 0.096 ms
So, we see that t_ix
is used with the relatively efficient Bitmap
-
excellent!! You have given me a very good starting point, from it I can compose a generated column more suitable to my work. I can change the algorithm of the generated column to fit any requests in the future! Thanks a lot! Good man!!!Leon– Leon2023年04月24日 01:06:40 +00:00Commented Apr 24, 2023 at 1:06
-
@Leon - glad to be of help!Vérace– Vérace2023年04月24日 08:03:57 +00:00Commented Apr 24, 2023 at 8:03
Explore related questions
See similar questions with these tags.