Query is extremely slow selecting data for last x events for racer

Question 1

I have a table race_racer that contains time serialized logs for each racer in a race. I am trying to select the lastest record for a given racer in x number of races prior to a date. Basically selecting their finalized stats. The table has approximately 64mm records. I'm using a framework that doesn't support window functions and DB functions are not an option.

I cannot select based on a list of race_ids because a racer might not be in a given race.

Postgres 15

Table DLL

CREATE TABLE race_racer (
 race_racer_id SERIAL PRIMARY KEY,
 log_id integer NOT NULL,
 racer_id integer NOT NULL,
 race_id integer NOT NULL,
 stats jsonb,
 created_at timestamp without time zone NOT NULL DEFAULT now()
);
-- Indices -------------------------------------------------------
CREATE UNIQUE INDEX race_racer_pkey ON race_racer(race_racer_id int4_ops);
CREATE INDEX race_racer_log_id_idx ON race_racer(log_id int4_ops);
CREATE INDEX race_racer_racer_id_idx ON race_racer(racer_id int4_ops);
CREATE INDEX race_racer_race_id_idx ON race_racer(race_id int4_ops);
CREATE INDEX race_racer_created_at_idx ON race_racer(created_at timestamp_ops);

race_racer_id is an auto increment and log_id the record id from our vendor. It's possible for duplicate log_id records to exist as the data is populated in batches.

Query

SELECT MAX(p2.race_racer_id) race_racer_id
FROM race_racer p2
WHERE p2.racer_id = 10002093
AND p2.created_at < '2024-08-01'
GROUP BY p2.log_id, p2.race_id
ORDER BY p2.log_id DESC
LIMIT 25

EXPLAIN(ANALYZE,BUFFERS,SETTINGS,VERBOSE)

Limit (cost=115.70..2445.14 rows=25 width=12) (actual time=150899.392..369576.667 rows=25 loops=1)
 Output: (max(race_racer_id)), log_id, race_id
 Buffers: shared hit=3984768 read=553269 written=66
 -> GroupAggregate (cost=115.70..9417387.87 rows=101068 width=12) (actual time=150899.390..369576.638 rows=25 loops=1)
 Output: max(race_racer_id), log_id, race_id
 Group Key: p2.log_id, p2.race_id
 Buffers: shared hit=3984768 read=553269 written=66
 -> Incremental Sort (cost=115.70..9415611.55 rows=102085 width=12) (actual time=147171.318..369575.442 rows=7578 loops=1)
 Output: log_id, race_id, race_racer_id
 Sort Key: p2.log_id DESC, p2.race_id
 Presorted Key: p2.log_id
 Full-sort Groups: 26 Sort Method: quicksort Average Memory: 28kB Peak Memory: 28kB
 Pre-sorted Groups: 26 Sort Method: quicksort Average Memory: 41kB Peak Memory: 41kB
 Buffers: shared hit=3984768 read=553269 written=66
 -> Index Scan Backward using race_racer_log_id_idx on public.race_racer p2 (cost=0.44..9411732.46 rows=102085 width=12) (actual time=142302.220..369572.386 rows=7865 loops=1)
 Output: log_id, race_id, race_racer_id
 Filter: ((p2.created_at < '2024-08-01 00:00:00'::timestamp without time zone) AND (p2.racer_id = 10002093))
 Rows Removed by Filter: 4547156
 Buffers: shared hit=3984765 read=553269 written=66
Settings: effective_cache_size = '19256MB', effective_io_concurrency = '300', maintenance_io_concurrency = '300', random_page_cost = '1', work_mem = '25671kB'
Query Identifier: 7029425313897609646
Planning:
 Buffers: shared hit=182 read=10
Planning Time: 11.919 ms
Execution Time: 369579.279 ms

The query is executing in about 45 seconds which is incredibly slow. Please help! If I select the latest race data for a specific racer using race_id, it resolves in under 20ms.

I've tried making indexes and changing the query around to no avail.

Question 2

Please update the post by tagging the version of PostgreSQL and adding the output of EXPLAIN(ANALYZE,BUFFERS,SETTINGS,VERBOSE). What framework are you using that precludes the use of window functions? If you're able to run an arbitrary query, then I don't see how a framework could impose such a limitation. Consider adding code to generate a synthetic data set with characteristics similar to the live data. At the very least, provide the number of races, number of racers, and the min, max, and mean number of racers per race.

Question 3

Phalcon. The orm doesn’t go low enough. It stumbles on commands it doesn’t recognize. Dialect functions can be added but it hasn’t worked. Changing is not an option right now.

Question 4

Post has been updated with Postgres version (15) and requested details.

Question 5

Please use EXPLAIN(ANALYZE,BUFFERS,SETTINGS,VERBOSE), not just explain. A plan is just a plan, if will never be fast, slow, good or bad. You have to execute the plan to get this information.

Question 6

This query could benefit from an index like this:

CREATE INDEX idx_cover_race_racer ON race_racer(
 racer_id, created_at, log_id, race_id, race_racer_id);

You might have to change/optimize the order of the columns in the index.

Question 7

Will update, index creation is taking a bit.

Question 8

Marginal impact with this so im trying to adjust.

Question 9

@WebDevz Without the new query plan, we have no idea what is going on. The indexes you created first, could be the issue, they are smaller and the query planner might have a preference in using one of these indexes. You might have to drop them

Question 10

Try an index on (racer_id, log_id).

That way it doesn't need to decide whether to get the selectivity of racer_id or the ordering of log_id. It can get both at the same time.

Question 11

Will update, index creation is taking a bit.

Question 12

This hasn't helped, can other indexes be conflicting?

Question 13

Possible, but doesn't seem likely. If this is not a production server, you could just drop whatever index it is choosing to use instead and see if that helps. Is the plan still the same one as the one you already reported, or did it change to something else also slow?

Frank Heikens 129k26 gold badges157 silver badges153 bronze badges · Answer 1 · 2024-08-12 15:48:56Z

1

This query could benefit from an index like this:

CREATE INDEX idx_cover_race_racer ON race_racer(
 racer_id, created_at, log_id, race_id, race_racer_id);

You might have to change/optimize the order of the columns in the index.

Share

Improve this answer

answered Aug 12, 2024 at 15:48

Frank Heikens's user avatar

Frank Heikens

129k26 gold badges157 silver badges153 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

WebDevz

WebDevz Over a year ago

Will update, index creation is taking a bit.

2024年08月12日T16:26:40.633Z+00:00

WebDevz

WebDevz Over a year ago

Marginal impact with this so im trying to adjust.

2024年08月12日T17:16:44.07Z+00:00

Frank Heikens

Frank Heikens Over a year ago

@WebDevz Without the new query plan, we have no idea what is going on. The indexes you created first, could be the issue, they are smaller and the query planner might have a preference in using one of these indexes. You might have to drop them

2024年08月12日T17:25:55.14Z+00:00

jjanes 45k5 gold badges39 silver badges48 bronze badges · Answer 2 · 2024-08-12 16:04:54Z

1

Try an index on (racer_id, log_id).

That way it doesn't need to decide whether to get the selectivity of racer_id or the ordering of log_id. It can get both at the same time.

Share

Improve this answer

answered Aug 12, 2024 at 16:04

jjanes's user avatar

jjanes

45k5 gold badges39 silver badges48 bronze badges

3 Comments

WebDevz

WebDevz Over a year ago

Will update, index creation is taking a bit.

2024年08月12日T16:26:43.37Z+00:00

WebDevz

WebDevz Over a year ago

This hasn't helped, can other indexes be conflicting?

2024年08月12日T17:16:07.607Z+00:00

jjanes

jjanes Over a year ago

Possible, but doesn't seem likely. If this is not a production server, you could just drop whatever index it is choosing to use instead and see if that helps. Is the plan still the same one as the one you already reported, or did it change to something else also slow?

2024年08月13日T17:15:42.777Z+00:00

CollectivesTM on Stack Overflow

Query is extremely slow selecting data for last x events for racer

2 Answers 2

3 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

3 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related