Here are my tables:
accounts_company:
- id
...
accounts_companyview:
- account_id: FK to company
- viewer_id: FK to company
shipments_address:
- companyview_id: FK to companyview
- is_shipper: boolean (not null)
accounts_similarcompanyview
- original_id: FK to companyview
- similar_id: FK to companyview
This is my query (generated by Django)
SELECT COUNT(*) AS "__count"
FROM "accounts_similarcompanyview"
INNER JOIN "accounts_companyview"
ON ("accounts_similarcompanyview"."original_id" = "accounts_companyview"."id")
INNER JOIN "shipments_address"
ON ("accounts_companyview"."id" = "shipments_address"."company_view_id")
INNER JOIN "accounts_companyview" T5
ON ("accounts_similarcompanyview"."similar_id" = T5."id")
INNER JOIN "shipments_address" T6
ON (T5."id" = T6."company_view_id")
WHERE ("accounts_similarcompanyview"."are_similar" = true AND "shipments_address"."is_shipper" = true AND "accounts_companyview"."viewer_id" = 51729 AND T6."is_shipper" = true AND T5."viewer_id" = 51729);
Output of EXPLAIN ANALYZE: https://explain.depesz.com/s/slJK
I can see that there is an Index Scan taking most of the time, so I tried to add a few indexes but no change whatsoever, I still get a full index scan.
CREATE INDEX "shipments_a_is_ship_d78ee4_idx" ON "shipments_address" ("is_shipper");
CREATE INDEX "shipments_a_is_ship_d78ee4_idx" ON "shipments_address" ("is_shipper", "company_view_id");
What index could I add to make this query faster?
EDIT: forgot to add the size of the tables
select count(*) from shipments_address;
2998765
select count(*) from accounts_company;
168224
select count(*) from accounts_companyview;
371560
select count(*) from accounts_similarcompanyview;
83434
2 Answers 2
The main problem are the bad estimates of the join cardinalities that lead PostgreSQL to use a nested loop join when a hash join would perform better.
There is one simple thing you can do to reduce the impact of the outermost nested loop join:
CREATE INDEX ON shipments_address (company_view_id) WHERE is_shipper;
That should cut the execution time roughly in half.
Other than that, I can think of the ugly method of temporarily disabling nested loop joins:
BEGIN;
SET LOCAL enable_nestloop = off;
/* your query */
COMMIT;
-
Thanks, that seemed to make it a lot faster (from 75s to 15s) on my staging db: explain.depesz.com/s/p1CLCorentin S.– Corentin S.2019年10月22日 07:12:46 +00:00Commented Oct 22, 2019 at 7:12
You are not getting a full index scan. You are getting a regular (partial or parameterized) index scan, repeated 52,190 times. Each individual scan is very fast (I'm not sure how much it can be improved) but when you do it that many times it adds up.
It might be faster to do a hash join against that table rather than the nested loop index scan. It depends on how big the table is. You can drop the index "shipments_address_company_view_id_1119f23e" and see what happens. Or you could turn up random_page_cost (just in this one session) and see if that forces the switch, and whether it is better once it does.
CREATE INDEX "shipments_a_is_ship_d78ee4_idx" ON "shipments_address" ("is_shipper", "company_view_id");
I would expect that that index would be used. Although I wouldn't expect using it to improve your performance dramatically--at most 50%, certainly not 10 fold. Also, I would suggest swapping the order of the columns. Either way would work for this particular query, but the putting the boolean last will probably make it usable for a greater variety of queries and allow you to drop the index on (company_view_id)
alone.
Perhaps you can drop the other index and see if this one gets used then. If you don't have a test/QA database setup and don't want to really drop the index in production, you can drop it inside a transaction, do the EXPLAIN and then roll it back. Other users will be locked out of your table from the DROP until the ROLLBACK, but if you put all the commands into a file, or put them all on the command line on one line, and if do a plain EXPLAIN rather than an EXPLAIN ANALYZE, this should be a small fraction of a second.
-
Thanks, I ran it on my staging DB with the index reversed: "company_view_id", "is_shipper" and it took it from 75s to 16s: explain.depesz.com/s/yDXqCorentin S.– Corentin S.2019年10月22日 07:11:59 +00:00Commented Oct 22, 2019 at 7:11
Explore related questions
See similar questions with these tags.
SET STATISTICS IO ON;
and post the message from SSMS after running your query with it enabled? And since you're working with views it's possible that something on the view code (that was made for a different purpose ohter than the result of this specific query) is causing the scan.accounts_companyview
is the name of the table. Or maybe I misunderstood something?