2

I'm using the following SQL query twice with different values of limit. In my case, 40 and 100, though it doesn't really matter. My question is - can I combine these two queries into one query? It would probably be faster. sample is a PG array.

I don't know how much more information is needed, but if you need more details, please ask. I'm reluctant to add details about the schema unless necessary, since it will make the question much longer.

SELECT SUM(motif)
FROM (SELECT ROW_NUMBER() OVER (PARTITION BY crossvalnum ORDER BY crossvalnum, pvalue DESC, margstat) AS r,
 (seqindex IS NOT NULL)::INTEGER AS motif,
 crossval.crossvalnum,
 data.margstat,
 data.pvalue
 FROM data
 INNER JOIN datasubgroup
 ON data.datasubgroup_id=datasubgroup.id
 INNER JOIN crossval
 ON datasubgroup.crossval_id=crossval.id
 WHERE data.seqindex =ANY(crossval.sample)
 OR data.seqindex IS NULL
 )
 AS q
WHERE q.r <= limit
GROUP BY crossvalnum;

The value for 40 is

sum 
-----
 25
 22
 19
 16
 24
(5 rows)

The value for 100 is

sum 
-----
 32
 28
 24
 23
 31
(5 rows)

Update: motif is defined an an indicator variable, (seqindex IS NOT NULL)::INTEGER. The SUM corresponds with the GROUP BY crossvalnum as an aggregate query. So the sum happens across each crossvalnum group. So there is a window function inside an aggregate query. I want the output to be two columns.

asked Dec 16, 2012 at 9:25
4
  • Can you edit the query so it's obvious which table the crossvalnum, margstat, etc. columns are in? Commented Dec 16, 2012 at 10:21
  • And what you want as output. 5 rows and 2 columns? Commented Dec 16, 2012 at 10:22
  • What is the SUM(motif) supposed to do? Count the number of rows that seqindex is not null? Commented Dec 16, 2012 at 10:38
  • @ypercube Yes. motif is defined an an indicator variable. (seqindex IS NOT NULL)::INTEGER. The SUM corresponds with the GROUP BY crossvalnum as an aggregate query. So the sum happens across each crossvalnum group. So there is a window function inside an aggregate query. The query itself could quite possibly be improved, but it does work for me. Question edited as requested. Commented Dec 16, 2012 at 10:48

2 Answers 2

3

Will it work for you ?

SELECT SUM(CASE WHEN r <=LEAST(40,100) THEN motif ELSE 0 END) as sum1,
SUM(motif) as sum2
FROM (...) q -- or AS q , whatever version you prefer
WHERE q.r <= GREATEST(40,100)
GROUP BY crossvalnum;
answered Dec 16, 2012 at 13:08
4
  • That works well thanks, though I don't yet understand why it works. :-) Commented Dec 16, 2012 at 15:02
  • @Faheem Mitha : you are passing 2 numbers (say 100 and 40) , one number is greater than another, so condition q.r <= greatest which is q.r <=100 also returns everything you need for q.r<=40; thus, you have one sum (sum2) naturally with just SUM(motif). To get second sum (sum1) you need to ignore records where r is between 41 and 100 which is done with CASE... Commented Dec 16, 2012 at 15:09
  • Thanks for the explanation. BTW, I think the query needs a AS q after FROM (...). Commented Dec 16, 2012 at 15:31
  • Sure it should be there... Sorry, typo. Fixing. Commented Dec 16, 2012 at 15:52
3

If I understand correctly, the query does a count in greatest-n-per-row type of query where n takes 2 values (40 and 100).

This can probably be solved better by rewriting the subquery. The easier to write might be to do a join of two queries:

SELECT
 a.motif AS cntA, b.motif AS cntB, a.crossvalnum
FROM
 (query with limit = 40) AS a
 JOIN
 (query with limit = 100) AS b
 ON b.crossvalnum = a.crossvalnum ;

Here is another way by moving the counts inside the subquery:

SELECT MIN(motif) AS cntA,
 MAX(motif) AS cntB,
 crossvalnum
FROM (SELECT ROW_NUMBER() OVER 
 ( PARTITION BY crossvalnum 
 ORDER BY crossvalnum, pvalue DESC, margstat
 ) AS r,
 COUNT(*) OVER 
 ( PARTITION BY crossvalnum 
 ) AS c,
 COUNT(seqindex) OVER 
 ( PARTITION BY crossvalnum 
 ORDER BY crossvalnum, pvalue DESC, margstat
 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
 ) AS motif,
 crossval.crossvalnum
 FROM data
 INNER JOIN datasubgroup
 ON data.datasubgroup_id=datasubgroup.id
 INNER JOIN crossval
 ON datasubgroup.crossval_id=crossval.id
 WHERE data.seqindex =ANY(crossval.sample)
 OR data.seqindex IS NULL
 )
 AS q 
WHERE r IN (LEAST(c, @limitA), LEAST(c, @limitB)) 
GROUP BY crossvalnum ;
answered Dec 16, 2012 at 11:07
1
  • For the second query, I get ERROR: count(*) must be used to call a parameterless aggregate function LINE 8: COUNT() OVER. Once COUNT() is changed to COUNT(*), it works, thanks. Commented Dec 16, 2012 at 15:04

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.