0

I have a table race_racer that contains time serialized logs for each racer in a race. I am trying to select the lastest record for a given racer in x number of races prior to a date. Basically selecting their finalized stats. The table has approximately 64mm records. I'm using a framework that doesn't support window functions and DB functions are not an option.

I cannot select based on a list of race_ids because a racer might not be in a given race.

Postgres 15

Table DLL

CREATE TABLE race_racer (
 race_racer_id SERIAL PRIMARY KEY,
 log_id integer NOT NULL,
 racer_id integer NOT NULL,
 race_id integer NOT NULL,
 stats jsonb,
 created_at timestamp without time zone NOT NULL DEFAULT now()
);
-- Indices -------------------------------------------------------
CREATE UNIQUE INDEX race_racer_pkey ON race_racer(race_racer_id int4_ops);
CREATE INDEX race_racer_log_id_idx ON race_racer(log_id int4_ops);
CREATE INDEX race_racer_racer_id_idx ON race_racer(racer_id int4_ops);
CREATE INDEX race_racer_race_id_idx ON race_racer(race_id int4_ops);
CREATE INDEX race_racer_created_at_idx ON race_racer(created_at timestamp_ops);

race_racer_id is an auto increment and log_id the record id from our vendor. It's possible for duplicate log_id records to exist as the data is populated in batches.

Query

SELECT MAX(p2.race_racer_id) race_racer_id
FROM race_racer p2
WHERE p2.racer_id = 10002093
AND p2.created_at < '2024-08-01'
GROUP BY p2.log_id, p2.race_id
ORDER BY p2.log_id DESC
LIMIT 25
 

EXPLAIN(ANALYZE,BUFFERS,SETTINGS,VERBOSE)

Limit (cost=115.70..2445.14 rows=25 width=12) (actual time=150899.392..369576.667 rows=25 loops=1)
 Output: (max(race_racer_id)), log_id, race_id
 Buffers: shared hit=3984768 read=553269 written=66
 -> GroupAggregate (cost=115.70..9417387.87 rows=101068 width=12) (actual time=150899.390..369576.638 rows=25 loops=1)
 Output: max(race_racer_id), log_id, race_id
 Group Key: p2.log_id, p2.race_id
 Buffers: shared hit=3984768 read=553269 written=66
 -> Incremental Sort (cost=115.70..9415611.55 rows=102085 width=12) (actual time=147171.318..369575.442 rows=7578 loops=1)
 Output: log_id, race_id, race_racer_id
 Sort Key: p2.log_id DESC, p2.race_id
 Presorted Key: p2.log_id
 Full-sort Groups: 26 Sort Method: quicksort Average Memory: 28kB Peak Memory: 28kB
 Pre-sorted Groups: 26 Sort Method: quicksort Average Memory: 41kB Peak Memory: 41kB
 Buffers: shared hit=3984768 read=553269 written=66
 -> Index Scan Backward using race_racer_log_id_idx on public.race_racer p2 (cost=0.44..9411732.46 rows=102085 width=12) (actual time=142302.220..369572.386 rows=7865 loops=1)
 Output: log_id, race_id, race_racer_id
 Filter: ((p2.created_at < '2024-08-01 00:00:00'::timestamp without time zone) AND (p2.racer_id = 10002093))
 Rows Removed by Filter: 4547156
 Buffers: shared hit=3984765 read=553269 written=66
Settings: effective_cache_size = '19256MB', effective_io_concurrency = '300', maintenance_io_concurrency = '300', random_page_cost = '1', work_mem = '25671kB'
Query Identifier: 7029425313897609646
Planning:
 Buffers: shared hit=182 read=10
Planning Time: 11.919 ms
Execution Time: 369579.279 ms

The query is executing in about 45 seconds which is incredibly slow. Please help! If I select the latest race data for a specific racer using race_id, it resolves in under 20ms.

I've tried making indexes and changing the query around to no avail.

asked Aug 12, 2024 at 4:42
4
  • 1
    Please update the post by tagging the version of PostgreSQL and adding the output of EXPLAIN(ANALYZE,BUFFERS,SETTINGS,VERBOSE). What framework are you using that precludes the use of window functions? If you're able to run an arbitrary query, then I don't see how a framework could impose such a limitation. Consider adding code to generate a synthetic data set with characteristics similar to the live data. At the very least, provide the number of races, number of racers, and the min, max, and mean number of racers per race. Commented Aug 12, 2024 at 5:39
  • Phalcon. The orm doesn’t go low enough. It stumbles on commands it doesn’t recognize. Dialect functions can be added but it hasn’t worked. Changing is not an option right now. Commented Aug 12, 2024 at 5:47
  • Post has been updated with Postgres version (15) and requested details. Commented Aug 12, 2024 at 6:07
  • 1
    Please use EXPLAIN(ANALYZE,BUFFERS,SETTINGS,VERBOSE), not just explain. A plan is just a plan, if will never be fast, slow, good or bad. You have to execute the plan to get this information. Commented Aug 12, 2024 at 14:31

2 Answers 2

1

This query could benefit from an index like this:

CREATE INDEX idx_cover_race_racer ON race_racer(
 racer_id, created_at, log_id, race_id, race_racer_id);

You might have to change/optimize the order of the columns in the index.

answered Aug 12, 2024 at 15:48
Sign up to request clarification or add additional context in comments.

3 Comments

Will update, index creation is taking a bit.
Marginal impact with this so im trying to adjust.
@WebDevz Without the new query plan, we have no idea what is going on. The indexes you created first, could be the issue, they are smaller and the query planner might have a preference in using one of these indexes. You might have to drop them
1

Try an index on (racer_id, log_id).

That way it doesn't need to decide whether to get the selectivity of racer_id or the ordering of log_id. It can get both at the same time.

answered Aug 12, 2024 at 16:04

3 Comments

Will update, index creation is taking a bit.
This hasn't helped, can other indexes be conflicting?
Possible, but doesn't seem likely. If this is not a production server, you could just drop whatever index it is choosing to use instead and see if that helps. Is the plan still the same one as the one you already reported, or did it change to something else also slow?

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.