I'm using the following SQL query twice with different values of limit
. In my case, 40 and 100, though it doesn't really matter. My question is - can I combine these two queries into one query? It would probably be faster. sample
is a PG array.
I don't know how much more information is needed, but if you need more details, please ask. I'm reluctant to add details about the schema unless necessary, since it will make the question much longer.
SELECT SUM(motif)
FROM (SELECT ROW_NUMBER() OVER (PARTITION BY crossvalnum ORDER BY crossvalnum, pvalue DESC, margstat) AS r,
(seqindex IS NOT NULL)::INTEGER AS motif,
crossval.crossvalnum,
data.margstat,
data.pvalue
FROM data
INNER JOIN datasubgroup
ON data.datasubgroup_id=datasubgroup.id
INNER JOIN crossval
ON datasubgroup.crossval_id=crossval.id
WHERE data.seqindex =ANY(crossval.sample)
OR data.seqindex IS NULL
)
AS q
WHERE q.r <= limit
GROUP BY crossvalnum;
The value for 40 is
sum
-----
25
22
19
16
24
(5 rows)
The value for 100 is
sum
-----
32
28
24
23
31
(5 rows)
Update: motif
is defined an an indicator variable, (seqindex IS NOT NULL)::INTEGER
. The SUM
corresponds with the GROUP BY crossvalnum
as an aggregate query. So the sum happens across each crossvalnum group. So there is a window function inside an aggregate query. I want the output to be two columns.
2 Answers 2
Will it work for you ?
SELECT SUM(CASE WHEN r <=LEAST(40,100) THEN motif ELSE 0 END) as sum1,
SUM(motif) as sum2
FROM (...) q -- or AS q , whatever version you prefer
WHERE q.r <= GREATEST(40,100)
GROUP BY crossvalnum;
-
That works well thanks, though I don't yet understand why it works. :-)Faheem Mitha– Faheem Mitha2012年12月16日 15:02:50 +00:00Commented Dec 16, 2012 at 15:02
-
@Faheem Mitha : you are passing 2 numbers (say 100 and 40) , one number is greater than another, so condition
q.r <= greatest
which isq.r <=100
also returns everything you need forq.r<=40
; thus, you have one sum (sum2
) naturally with justSUM(motif)
. To get second sum (sum1
) you need to ignore records where r is between 41 and 100 which is done withCASE
...a1ex07– a1ex072012年12月16日 15:09:00 +00:00Commented Dec 16, 2012 at 15:09 -
Thanks for the explanation. BTW, I think the query needs a
AS q
afterFROM (...)
.Faheem Mitha– Faheem Mitha2012年12月16日 15:31:20 +00:00Commented Dec 16, 2012 at 15:31 -
Sure it should be there... Sorry, typo. Fixing.a1ex07– a1ex072012年12月16日 15:52:46 +00:00Commented Dec 16, 2012 at 15:52
If I understand correctly, the query does a count in greatest-n-per-row
type of query where n
takes 2 values (40 and 100).
This can probably be solved better by rewriting the subquery. The easier to write might be to do a join of two queries:
SELECT
a.motif AS cntA, b.motif AS cntB, a.crossvalnum
FROM
(query with limit = 40) AS a
JOIN
(query with limit = 100) AS b
ON b.crossvalnum = a.crossvalnum ;
Here is another way by moving the counts inside the subquery:
SELECT MIN(motif) AS cntA,
MAX(motif) AS cntB,
crossvalnum
FROM (SELECT ROW_NUMBER() OVER
( PARTITION BY crossvalnum
ORDER BY crossvalnum, pvalue DESC, margstat
) AS r,
COUNT(*) OVER
( PARTITION BY crossvalnum
) AS c,
COUNT(seqindex) OVER
( PARTITION BY crossvalnum
ORDER BY crossvalnum, pvalue DESC, margstat
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS motif,
crossval.crossvalnum
FROM data
INNER JOIN datasubgroup
ON data.datasubgroup_id=datasubgroup.id
INNER JOIN crossval
ON datasubgroup.crossval_id=crossval.id
WHERE data.seqindex =ANY(crossval.sample)
OR data.seqindex IS NULL
)
AS q
WHERE r IN (LEAST(c, @limitA), LEAST(c, @limitB))
GROUP BY crossvalnum ;
-
For the second query, I get
ERROR: count(*) must be used to call a parameterless aggregate function LINE 8: COUNT() OVER
. OnceCOUNT()
is changed toCOUNT(*)
, it works, thanks.Faheem Mitha– Faheem Mitha2012年12月16日 15:04:43 +00:00Commented Dec 16, 2012 at 15:04
crossvalnum, margstat, etc.
columns are in?SUM(motif)
supposed to do? Count the number of rows thatseqindex
is not null?motif
is defined an an indicator variable.(seqindex IS NOT NULL)::INTEGER
. The SUM corresponds with the GROUP BYcrossvalnum
as an aggregate query. So the sum happens across each crossvalnum group. So there is a window function inside an aggregate query. The query itself could quite possibly be improved, but it does work for me. Question edited as requested.