I have a large table join on the keys. The explain plan shows that most of the time is spent on sorting.
Is there anything I can do to prevent the large sort steps? Can I pre-sort the table?
Query and plan here : https://explain.dalibo.com/plan/15a6a91492g8g935#plan/node/5
Both tables already have an index on the key, like this, but that index does not seem to be used:
CREATE UNIQUE INDEX "stage_fact_revenue_900_idx_a" ON "stage"."stage_fact_revenue_900_add_dims" USING BTREE ("date","billed_relation_id","subject_client_id","identifier","revenue_type");
CREATE UNIQUE INDEX "fact_revenue_idx_a" ON "star"."fact_revenue" USING BTREE ("date","identifier","billed_relation_id","subject_client_id","revenue_type");
-
3Could you please provide the DDL of the relevant tables and the EXPLAIN plan (as text). I think you require more indexes, but it's difficult to tell without further info!Vérace– Vérace2023年05月15日 16:50:54 +00:00Commented May 15, 2023 at 16:50
1 Answer 1
An index speeds up the search if the returned rows are not more than roughly 10%, otherwise a full scan of the table is faster.
Postgresql chooses a full scan for stage_fact_revenue_900_add_dims
because it needs to select all rows and it needs columns not present in the index.
For the same reason, it thinks that it's better to scan fact_revenue
once and sort the results rather than doing 177 millions index seeks.
Some things you could try:
- Add to your query an explicit
ORDER BY date, identifier, billed_relation_id, subject_client_id, revenue_type
this should suggest to Postgres that using the index onfact_revenue
could avoid the sort. - Change the first index to have the same fields order of the second (or add a new index in that order). At present they have the same fields but in different orders, so only one of them can actually be used to avoid the sort.
- Try rewriting the query using NOT EXISTS instead of a LEFT JOIN:
Like this:
select
stage_fact_revenue_900_add_dims.date as date,
stage_fact_revenue_900_add_dims.dim_date_key as dim_date_key,
stage_fact_revenue_900_add_dims.identifier as identifier,
stage_fact_revenue_900_add_dims.revenue_type as revenue_type,
stage_fact_revenue_900_add_dims.product_id as product_id,
stage_fact_revenue_900_add_dims.dim_product_key as dim_product_key,
stage_fact_revenue_900_add_dims.billing_partner_id as billing_partner_id,
stage_fact_revenue_900_add_dims.dim_billing_partner_key as dim_billing_partner_key,
stage_fact_revenue_900_add_dims.billed_relation_id as billed_relation_id,
stage_fact_revenue_900_add_dims.subject_client_id as subject_client_id,
stage_fact_revenue_900_add_dims.dim_subject_client_key as dim_subject_client_key,
stage_fact_revenue_900_add_dims.client_id as client_id,
stage_fact_revenue_900_add_dims.dim_billed_client_key as dim_billed_client_key,
stage_fact_revenue_900_add_dims.partner_id as partner_id,
stage_fact_revenue_900_add_dims.dim_billed_partner_key as dim_billed_partner_key,
stage_fact_revenue_900_add_dims.price as price,
stage_fact_revenue_900_add_dims.duration as duration,
stage_fact_revenue_900_add_dims.amount as amount,
stage_fact_revenue_900_add_dims.leaf as leaf,
stage_fact_revenue_900_add_dims.dss_record_source as dss_record_source,
current_timestamp as dss_create_time,
current_timestamp as dss_update_time
from
stage.stage_fact_revenue_900_add_dims stage_fact_revenue_900_add_dims
where
NOT EXISTS (SELECT 1 FROM star.fact_revenue fact_revenue WHERE stage_fact_revenue_900_add_dims.date = fact_revenue.date
and stage_fact_revenue_900_add_dims.identifier = fact_revenue.identifier
and stage_fact_revenue_900_add_dims.revenue_type = fact_revenue.revenue_type
and stage_fact_revenue_900_add_dims.billed_relation_id = fact_revenue.billed_relation_id
and stage_fact_revenue_900_add_dims.subject_client_id = fact_revenue.subject_client_id)
ORDER BY date, identifier, billed_relation_id, subject_client_id, revenue_type;
-
Hi @andrea-b, thank you or your extensive answer. I tried all things. resulted in a relatively small gain. In the end I found out that the main time absorber was something else: the query was used to insert data into a table, and that table contained 17 indexes..... Pruning the amount of indexes was the best performance gain.Bart Jonk– Bart Jonk2023年06月09日 11:05:51 +00:00Commented Jun 9, 2023 at 11:05
Explore related questions
See similar questions with these tags.