I'm having issues using DISTINCT
and COUNT
in the same query. I would like to get unique items then get a total count for each month. (I've done this in two queries but I know there's a better way.)
Query:
SELECT
FORMAT_TIMESTAMP('%Y-%m', Sell_Date) AS Sell_Date,
count(*) Total_Sold
FROM
`project.dataset.items`
GROUP BY
Sell_Date
ORDER BY
Sell_Date
Results (snippet):
Row Sell_Date Total_Sold
1 2010-05 15
2 2010-06 40
3 2010-07 75
4 2010-08 20
This is what I would like, however, this contains duplicate Item_Id
entries.
If I replace the SELECT
above with this:
SELECT
DISTINCT Item_Id,
FORMAT_TIMESTAMP('%Y-%m', Sell_Date) AS Sell_Date,
I get this error:
Error: SELECT list expression references column Item_Id which is neither grouped nor aggregated at [2:12]
If I replace the GROUP BY
with:
GROUP BY
Sell_Date, Item_Id
Results (snippet):
Row Item_Id Sell_Date Total_Sold
1 992 2010-05 1
2 118 2010-05 1
3 855 2010-05 1
4 846 2010-05 1
5 989 2010-05 1
6 505 2010-05 1
7 997 2010-05 1
8 983 2010-05 1
9 122 2010-05 1
10 601 2010-05 1
11 845 2010-05 1
How can I get DISTINCT
items then count the total for each month?
1 Answer 1
count (distinct item_id)
SELECT
FORMAT_TIMESTAMP('%Y-%m', Sell_Date) AS Sell_Date,
count(*) Total_Sold,
count (distinct item_id) as distinct_item_id
FROM
`project.dataset.items`
GROUP BY
FORMAT_TIMESTAMP('%Y-%m', Sell_Date) AS Sell_Date
ORDER BY
Sell_Date
-
1Worked perfectly, thanks. Could you please add an explanation for me and for others who may be curious about what's happening here? Thanks again.fragilewindows– fragilewindows2016年12月15日 14:16:34 +00:00Commented Dec 15, 2016 at 14:16
-
1Actually, this one is quiet straight-forward, COUNT(*) count all rows, COUNT(item_id) counts only item_ids that are not null, COUNT(DISTINCT item_id) counts item_ids that are not, but ignore duplicates of item_id. count (distinct x) where x is 10,20,10,20,20,10,30,20,10,20,30,10 will return 3 (10,20,30).David דודו Markovitz– David דודו Markovitz2016年12月15日 14:22:31 +00:00Commented Dec 15, 2016 at 14:22
-
1@MguerraTorres, I didn't work with google-bigquery and there is a good chance that you are right. Changed it. Thanks.David דודו Markovitz– David דודו Markovitz2016年12月15日 14:24:18 +00:00Commented Dec 15, 2016 at 14:24
-
1@MguerraTorres correct, BigQuery allows aliases defined in the SELECT list to be used in GROUP BY.ypercubeᵀᴹ– ypercubeᵀᴹ2016年12月15日 14:45:46 +00:00Commented Dec 15, 2016 at 14:45
-
1@DuduMarkovitz I don't know that. I guess a simple test with 2 rows having same year and month but different day/time would reveal what happens, by yielding 1 or 2 rows respectively. I don't have a bigquery installation to test.ypercubeᵀᴹ– ypercubeᵀᴹ2016年12月15日 14:50:50 +00:00Commented Dec 15, 2016 at 14:50