1

Here are my tables:

accounts_company:
 - id
 ...
accounts_companyview:
 - account_id: FK to company
 - viewer_id: FK to company
shipments_address:
 - companyview_id: FK to companyview
 - is_shipper: boolean (not null)
accounts_similarcompanyview
 - original_id: FK to companyview
 - similar_id: FK to companyview 

This is my query (generated by Django)

SELECT COUNT(*) AS "__count"
 FROM "accounts_similarcompanyview"
 INNER JOIN "accounts_companyview"
 ON ("accounts_similarcompanyview"."original_id" = "accounts_companyview"."id")
 INNER JOIN "shipments_address"
 ON ("accounts_companyview"."id" = "shipments_address"."company_view_id")
 INNER JOIN "accounts_companyview" T5
 ON ("accounts_similarcompanyview"."similar_id" = T5."id")
 INNER JOIN "shipments_address" T6
 ON (T5."id" = T6."company_view_id")
 WHERE ("accounts_similarcompanyview"."are_similar" = true AND "shipments_address"."is_shipper" = true AND "accounts_companyview"."viewer_id" = 51729 AND T6."is_shipper" = true AND T5."viewer_id" = 51729);

Output of EXPLAIN ANALYZE: https://explain.depesz.com/s/slJK

I can see that there is an Index Scan taking most of the time, so I tried to add a few indexes but no change whatsoever, I still get a full index scan.

CREATE INDEX "shipments_a_is_ship_d78ee4_idx" ON "shipments_address" ("is_shipper");
CREATE INDEX "shipments_a_is_ship_d78ee4_idx" ON "shipments_address" ("is_shipper", "company_view_id");

What index could I add to make this query faster?

EDIT: forgot to add the size of the tables

select count(*) from shipments_address;
 2998765
select count(*) from accounts_company;
 168224
select count(*) from accounts_companyview;
 371560
select count(*) from accounts_similarcompanyview;
 83434
asked Oct 21, 2019 at 16:07
5
  • Could you please enable IO statistics: SET STATISTICS IO ON; and post the message from SSMS after running your query with it enabled? And since you're working with views it's possible that something on the view code (that was made for a different purpose ohter than the result of this specific query) is causing the scan. Commented Oct 21, 2019 at 17:03
  • 1
    @Ronaldo: You are talking MS SQL Server, but this is about Postgres. Commented Oct 21, 2019 at 17:07
  • Sorry for my lack of attention, you're right. But I believe my point about the view having code that was meant to attend another query being the reason of your scans still stands. I don't believe Postgres differs from SQL Server on that point. Was the view created for this specific query? Commented Oct 21, 2019 at 17:16
  • In PostgreSQL, that is spelled track_io_timing. Commented Oct 21, 2019 at 17:55
  • @Ronaldo I don't think I'm working with views, accounts_companyview is the name of the table. Or maybe I misunderstood something? Commented Oct 22, 2019 at 7:13

2 Answers 2

1

The main problem are the bad estimates of the join cardinalities that lead PostgreSQL to use a nested loop join when a hash join would perform better.

There is one simple thing you can do to reduce the impact of the outermost nested loop join:

CREATE INDEX ON shipments_address (company_view_id) WHERE is_shipper;

That should cut the execution time roughly in half.

Other than that, I can think of the ugly method of temporarily disabling nested loop joins:

BEGIN;
SET LOCAL enable_nestloop = off;
/* your query */
COMMIT;
answered Oct 21, 2019 at 19:51
1
  • Thanks, that seemed to make it a lot faster (from 75s to 15s) on my staging db: explain.depesz.com/s/p1CL Commented Oct 22, 2019 at 7:12
1

You are not getting a full index scan. You are getting a regular (partial or parameterized) index scan, repeated 52,190 times. Each individual scan is very fast (I'm not sure how much it can be improved) but when you do it that many times it adds up.

It might be faster to do a hash join against that table rather than the nested loop index scan. It depends on how big the table is. You can drop the index "shipments_address_company_view_id_1119f23e" and see what happens. Or you could turn up random_page_cost (just in this one session) and see if that forces the switch, and whether it is better once it does.

CREATE INDEX "shipments_a_is_ship_d78ee4_idx" ON "shipments_address" ("is_shipper", "company_view_id");

I would expect that that index would be used. Although I wouldn't expect using it to improve your performance dramatically--at most 50%, certainly not 10 fold. Also, I would suggest swapping the order of the columns. Either way would work for this particular query, but the putting the boolean last will probably make it usable for a greater variety of queries and allow you to drop the index on (company_view_id) alone.

Perhaps you can drop the other index and see if this one gets used then. If you don't have a test/QA database setup and don't want to really drop the index in production, you can drop it inside a transaction, do the EXPLAIN and then roll it back. Other users will be locked out of your table from the DROP until the ROLLBACK, but if you put all the commands into a file, or put them all on the command line on one line, and if do a plain EXPLAIN rather than an EXPLAIN ANALYZE, this should be a small fraction of a second.

answered Oct 21, 2019 at 20:03
1
  • Thanks, I ran it on my staging DB with the index reversed: "company_view_id", "is_shipper" and it took it from 75s to 16s: explain.depesz.com/s/yDXq Commented Oct 22, 2019 at 7:11

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.