2

So I've run into this issue with a slow Postgres query. I've added an index but it doesn't seem to help. It takes 8 minutes to get 100 records from a table with 2822968 records.

Here's the query that I run:

SELECT "organisations"."id", "organisations"."url"
FROM "organisations" 
WHERE "organisations"."url" IS NOT NULL 
AND "organisations"."domain" IS NULL 
ORDER BY "organisations"."id" ASC
LIMIT 100

And here is the explain analyze

Limit (cost=17468.48..17468.73 rows=100 width=38) (actual time=517448.061..517449.431 rows=100 loops=1)
 -> Sort (cost=17468.48..17482.23 rows=5501 width=38) (actual time=517448.057..517448.791 rows=100 loops=1)
 Sort Key: id
 Sort Method: top-N heapsort Memory: 38kB
 -> Bitmap Heap Scan on organisations (cost=193.31..17258.23 rows=5501 width=38) (actual time=93.370..516988.919 rows=1096205 loops=1)
 Recheck Cond: ((url IS NOT NULL) AND (domain IS NULL))
 Heap Blocks: exact=59205
 -> Bitmap Index Scan on index_organisations_on_url_and_domain (cost=0.00..191.93 rows=5501 width=0) (actual time=82.242..82.243 rows=1098765 loops=1)
Planning Time: 0.202 ms
Execution Time: 517453.118 ms

I use rails - here is the partial multi column index from the schema file:

t.index ["url", "domain"], name: "index_organisations_on_url_and_domain", where: "((url IS NOT NULL) AND (domain IS NULL))"

If it helps this is running on Postgres RDS on a small instance. There are no other connections running as I turned off all background jobs, and there are no users connected using the app at the time of running.

I'm not sure how I can get this to return faster.

asked Jun 21, 2019 at 1:06

2 Answers 2

2

It seems that the statistics for that index are way off, and PostgreSQL may be faster performing a sequential scan.

Try to

ANALYZE organizations;

and see if that improves matters.

Another idea may be an index on organizations.id (if you don't already have that) with the idea to fetch rows in sort order and discard the ones that don't meet the criterion.

The perfect index would be

CREATE INDEX ON organizations (id)
 WHERE url IS NOT NULL AND domain IS NULL;

You could also include the url column in the index (using the INCLUDE clause in v11) and aim for an index only scan.

answered Jun 21, 2019 at 5:57
1
  • I think using INCLUDE doesn't really get you anything in this case (compared to including "url" in the index in the ordinary way) and yields an index which is useful in fewer situations. Commented Jun 21, 2019 at 14:08
0

a better index would be

CREATE INDEX ON organizations (domain,url);

as that index will group all the null domains together and under that all the not-null urls will also be in one block.

answered Jun 22, 2019 at 21:13

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.