7

My problem is represented by the following query:

SELECT 
 b.row_id, b.x, b.y, b.something,
 (SELECT a.x FROM my_table a WHERE a.row_id = (b.row_id - 1), a.something != 42 ) AS source_x,
 (SELECT a.y FROM my_table a WHERE a.row_id = (b.row_id - 1), a.something != 42 ) AS source_y
FROM 
 my_table b

I'm using the same subquery statement twice, for getting both source_x and source_y. That's why I'm wondering if it's possible to do it using one subquery only?

Because once I run this query on my real data (millions of rows) it seems to never finish and take hours, if not days (my connection hang up before the end).

I am using PostgreSQL 8.4

Erwin Brandstetter
667k159 gold badges1.2k silver badges1.3k bronze badges
asked Nov 6, 2011 at 19:11
0

5 Answers 5

8

I think you can use this approach:

SELECT b.row_id
 , b.x
 , b.y
 , b.something
 , a.x
 , a.y
 FROM my_table b
 left join my_table a on a.row_id = (b.row_id - 1)
 and a.something != 42
answered Nov 6, 2011 at 19:24

Comments

3

@DavidEG posted the best syntax for the query.

However, your problem is definitely not just with the query technique. A JOIN instead of two subqueries can speed up things by a factor of two at best. Most likely less. That doesn't explain "hours". Even with millions of rows, a decently set up Postgres should finish the simple query in seconds, not hours.

  • First thing that stands out is the syntax error in your query:

    ... WHERE a.row_id = (b.row_id - 1), a.something != 42
    

    AND or OR is needed here, not a comma.

  • Next thing to check are indexes. If row_id is not the primary key, you may not have an index on it. For optimum performance of this particular query create a multi-column index on (row_id, something) like this:

    CREATE INDEX my_table_row_id_something_idx ON my_table (row_id, something)
    
  • If the filter excludes the same value every time in something != 42you can also use a partial index instead for additional speed up:

    CREATE INDEX my_table_row_id_something_idx ON my_table (row_id)
    WHERE something != 42
    

    This will only make a substantial difference if 42 is a common value or something is a bigger column than just an integer. (An index with two integer columns normally occupies the the same size on disk as an index with just one, due to data alignment. See:

  • When performance is an issue, it is always a good idea to check your settings. Standard settings in Postgres use minimal resources in many distributions and are not up to handling "millions of rows".

  • Depending on your actual version of Postgres, an upgrade to a current version (9.1 at the time of writing) may help a lot.

  • Ultimately, hardware is always a factor, too. Tuning and optimizing can only get you so far.

answered Nov 6, 2011 at 23:45

2 Comments

I trier the partial index and then @DavidEG query and had the new table very quickly. Thanks a lot!
@JulieFen-Chong: Cool. :) A fitting index is essential with millions of rows.
0

old-fashioned syntax:

SELECT 
 b.row_id, b.x, b.y, b.something
 , a.x AS source_x
 , a.y AS source
FROM my_table b
 ,my_table a 
WHERE a.row_id = b.row_id - 1
 AND a.something != 42
 ;

Join-syntax:

SELECT 
 b.row_id, b.x, b.y, b.something
 , a.x AS source_x
 , a.y AS source
FROM my_table b
JOIN my_table a 
 ON (a.row_id = b.row_id - 1)
WHERE a.something != 42
 ;
answered Nov 6, 2011 at 19:31

2 Comments

You need LEFT JOIN for the requested result. DavidEG nailed it.
Yeah, the subqueries are supposed to return NULL if nothing is found. But is was too ugly, I suppose ...
0
SELECT b.row_id, b.x, b.y, b.something, a.x, a.y
 FROM my_table b
 LEFT JOIN (
 SELECT row_id + 1, x, y
 FROM my_table
 WHERE something != 42
 ) AS a ON a.row_id = b.row_id;
answered Nov 6, 2011 at 19:47

2 Comments

That would work, but is probably very slow, because you have to increment every row with something != 42 (likely most of the "millions of rows") before the join, impeding the use of a standard index for the join as a side effect.
@ErwinBrandstetter I see what you mean, I should have kept a.row_id = b.row_id - 1 in the join condition instead. I was concentrating too much on moving the something != 42 into the subquery.
0

Postgres:

 SELECT 
 b.row_id, b.x, b.y, b.something,
 source_x,
 source_y
 FROM 
 my_table b,
LATERAL(SELECT a.x AS source_x, a.y AS source_y FROM my_table a WHERE a.row_id = (b.row_id - 1), a.something != 42 )

MsSQL

SELECT 
 b.row_id, b.x, b.y, b.something,
 source_x,
 source_y
 FROM 
 my_table b
OUTER APPLY(SELECT a.x AS source_x, a.y AS source_y FROM my_table a WHERE a.row_id = (b.row_id - 1), a.something != 42 )
answered May 8, 2021 at 17:12

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.