PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.890
I'm relatively new to SQL and am learning a lot of about writing queries as in my current work role. I encountered this problem that puzzles me. If I do a count of subscriptions that are paid with a null churn date, I get a count of 1279. This is the correct number. But once I add the count of customers that installed shopify, this count jumps to 1835. Why would this happen? Can anybody help me out? Thanks!
select
count (s.account_id)
, count (case when c.type ilike 'shopifychannel' and s.status = 'paid'
then c.created_at else null end) as installed_shopify
from subscriptions s
join channels c on s.account_id = c.account_id
where s.status = 'paid'
and s.churned_at is null
Count s.id will count the total number of customers who have s.status = 'paid' Count case when 'shopifychannel' will count the total number of customers who have paid and installed a shopify channel. These are the outputs I am attempting to get.
1 Answer 1
I suppose you have a problem with your channels
table that may return more than one row per s.account_id. I propose this query :
SELECT
COUNT (DISTINCT s.account_id),
COUNT (DISTINCT CONCAT(s.account_id, c.type)) AS installed_shopify
FROM subscriptions s
LEFT JOIN channels c ON s.account_id = c.account_id AND c.type ILIKE 'shopifychannel'
WHERE s.status = 'paid'
AND s.churned_at IS NULL
Again, you must be sure that there is only (zero or one) row in channels for each account_id having the condition c.type ilike 'shopifychannel'
.
The LEFT JOIN
assure you don't miss a row in your primary table subscriptions
. The condition c.type ILIKE 'shopifychannel'
is in the LEFT JOIN because if you put it in the WHERE
clause, you'll also lose subscriptions
lines that don't have installed Shopify.
-
If there is more than 1 row, for some account_id having
c.type ilike 'shopifychannel'
then this would increase the count ofs.account_id
correct? Running this query gave me as.account_id
count of 1335. This number is higher but from my understanding it is because somes.account_id
do have more than 1 row forshopifychannel
.Lucas Neo– Lucas Neo2015年04月17日 01:40:23 +00:00Commented Apr 17, 2015 at 1:40 -
You're right. I was supposing that there would be only zero/one line for shopifychannel. Let me think about that. (;Eric Ly– Eric Ly2015年04月17日 04:49:24 +00:00Commented Apr 17, 2015 at 4:49
-
I modified the query so it takes in account the fact that a paid user can install the application multiple times.Eric Ly– Eric Ly2015年04月17日 05:52:22 +00:00Commented Apr 17, 2015 at 5:52
-
it worked, thanks! I like your use of the CONCAT() function and I think its a good method for counting the distincts from 2 variables.Lucas Neo– Lucas Neo2015年04月18日 09:18:43 +00:00Commented Apr 18, 2015 at 9:18
-
explain analyze
output for both too. What's your PostgreSQL version?SELECT version()
. Edit the question to add this info, then comment here when done.filter
clause on your aggregate expressions.