Combining two similar SQL queries

Question 1

I'm using the following SQL query twice with different values of limit. In my case, 40 and 100, though it doesn't really matter. My question is - can I combine these two queries into one query? It would probably be faster. sample is a PG array.

I don't know how much more information is needed, but if you need more details, please ask. I'm reluctant to add details about the schema unless necessary, since it will make the question much longer.

SELECT SUM(motif)
FROM (SELECT ROW_NUMBER() OVER (PARTITION BY crossvalnum ORDER BY crossvalnum, pvalue DESC, margstat) AS r,
 (seqindex IS NOT NULL)::INTEGER AS motif,
 crossval.crossvalnum,
 data.margstat,
 data.pvalue
 FROM data
 INNER JOIN datasubgroup
 ON data.datasubgroup_id=datasubgroup.id
 INNER JOIN crossval
 ON datasubgroup.crossval_id=crossval.id
 WHERE data.seqindex =ANY(crossval.sample)
 OR data.seqindex IS NULL
 )
 AS q
WHERE q.r <= limit
GROUP BY crossvalnum;

The value for 40 is

sum 
-----
 25
 22
 19
 16
 24
(5 rows)

The value for 100 is

sum 
-----
 32
 28
 24
 23
 31
(5 rows)

Update: motif is defined an an indicator variable, (seqindex IS NOT NULL)::INTEGER. The SUM corresponds with the GROUP BY crossvalnum as an aggregate query. So the sum happens across each crossvalnum group. So there is a window function inside an aggregate query. I want the output to be two columns.

Question 2

Can you edit the query so it's obvious which table the crossvalnum, margstat, etc. columns are in?

Question 3

And what you want as output. 5 rows and 2 columns?

Question 4

What is the SUM(motif) supposed to do? Count the number of rows that seqindex is not null?

Question 5

@ypercube Yes. motif is defined an an indicator variable. (seqindex IS NOT NULL)::INTEGER. The SUM corresponds with the GROUP BY crossvalnum as an aggregate query. So the sum happens across each crossvalnum group. So there is a window function inside an aggregate query. The query itself could quite possibly be improved, but it does work for me. Question edited as requested.

Question 6

Will it work for you ?

SELECT SUM(CASE WHEN r <=LEAST(40,100) THEN motif ELSE 0 END) as sum1,
SUM(motif) as sum2
FROM (...) q -- or AS q , whatever version you prefer
WHERE q.r <= GREATEST(40,100)
GROUP BY crossvalnum;

Question 7

That works well thanks, though I don't yet understand why it works. :-)

Question 8

@Faheem Mitha : you are passing 2 numbers (say 100 and 40) , one number is greater than another, so condition q.r <= greatest which is q.r <=100 also returns everything you need for q.r<=40; thus, you have one sum (sum2) naturally with just SUM(motif). To get second sum (sum1) you need to ignore records where r is between 41 and 100 which is done with CASE...

Question 9

Thanks for the explanation. BTW, I think the query needs a AS q after FROM (...).

Question 10

Sure it should be there... Sorry, typo. Fixing.

Question 11

If I understand correctly, the query does a count in greatest-n-per-row type of query where n takes 2 values (40 and 100).

This can probably be solved better by rewriting the subquery. The easier to write might be to do a join of two queries:

SELECT
 a.motif AS cntA, b.motif AS cntB, a.crossvalnum
FROM
 (query with limit = 40) AS a
 JOIN
 (query with limit = 100) AS b
 ON b.crossvalnum = a.crossvalnum ;

Here is another way by moving the counts inside the subquery:

SELECT MIN(motif) AS cntA,
 MAX(motif) AS cntB,
 crossvalnum
FROM (SELECT ROW_NUMBER() OVER 
 ( PARTITION BY crossvalnum 
 ORDER BY crossvalnum, pvalue DESC, margstat
 ) AS r,
 COUNT(*) OVER 
 ( PARTITION BY crossvalnum 
 ) AS c,
 COUNT(seqindex) OVER 
 ( PARTITION BY crossvalnum 
 ORDER BY crossvalnum, pvalue DESC, margstat
 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
 ) AS motif,
 crossval.crossvalnum
 FROM data
 INNER JOIN datasubgroup
 ON data.datasubgroup_id=datasubgroup.id
 INNER JOIN crossval
 ON datasubgroup.crossval_id=crossval.id
 WHERE data.seqindex =ANY(crossval.sample)
 OR data.seqindex IS NULL
 )
 AS q 
WHERE r IN (LEAST(c, @limitA), LEAST(c, @limitB)) 
GROUP BY crossvalnum ;

Question 12

For the second query, I get ERROR: count(*) must be used to call a parameterless aggregate function LINE 8: COUNT() OVER. Once COUNT() is changed to COUNT(*), it works, thanks.

a1ex07 a1ex07 9,0603 gold badges27 silver badges41 bronze badges · Accepted Answer · 2012-12-16 13:08:02Z

3

Will it work for you ?

SELECT SUM(CASE WHEN r <=LEAST(40,100) THEN motif ELSE 0 END) as sum1,
SUM(motif) as sum2
FROM (...) q -- or AS q , whatever version you prefer
WHERE q.r <= GREATEST(40,100)
GROUP BY crossvalnum;

Share

Improve this answer

edited Dec 16, 2012 at 15:53

answered Dec 16, 2012 at 13:08

a1ex07's user avatar

a1ex07 a1ex07

9,0603 gold badges27 silver badges41 bronze badges

4

That works well thanks, though I don't yet understand why it works. :-)

Faheem Mitha
– Faheem Mitha

2012年12月16日 15:02:50 +00:00
Commented Dec 16, 2012 at 15:02
@Faheem Mitha : you are passing 2 numbers (say 100 and 40) , one number is greater than another, so condition q.r <= greatest which is q.r <=100 also returns everything you need for q.r<=40; thus, you have one sum (sum2) naturally with just SUM(motif). To get second sum (sum1) you need to ignore records where r is between 41 and 100 which is done with CASE...

a1ex07
– a1ex07

2012年12月16日 15:09:00 +00:00
Commented Dec 16, 2012 at 15:09
Thanks for the explanation. BTW, I think the query needs a AS q after FROM (...).

Faheem Mitha
– Faheem Mitha

2012年12月16日 15:31:20 +00:00
Commented Dec 16, 2012 at 15:31
Sure it should be there... Sorry, typo. Fixing.

a1ex07
– a1ex07

2012年12月16日 15:52:46 +00:00
Commented Dec 16, 2012 at 15:52

Add a comment |

Stack Exchange Network

Combining two similar SQL queries

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Combining two similar SQL queries

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions