1

I'm having issues using DISTINCT and COUNT in the same query. I would like to get unique items then get a total count for each month. (I've done this in two queries but I know there's a better way.)

Query:

SELECT
 FORMAT_TIMESTAMP('%Y-%m', Sell_Date) AS Sell_Date,
 count(*) Total_Sold
FROM
 `project.dataset.items`
GROUP BY
 Sell_Date
ORDER BY
 Sell_Date

Results (snippet):

Row Sell_Date Total_Sold 
 1 2010-05 15 
 2 2010-06 40 
 3 2010-07 75 
 4 2010-08 20 

This is what I would like, however, this contains duplicate Item_Id entries.

If I replace the SELECT above with this:

SELECT
 DISTINCT Item_Id,
 FORMAT_TIMESTAMP('%Y-%m', Sell_Date) AS Sell_Date,

I get this error:

Error: SELECT list expression references column Item_Id which is neither grouped nor aggregated at [2:12]

If I replace the GROUP BY with:

GROUP BY
 Sell_Date, Item_Id

Results (snippet):

Row Item_Id Sell_Date Total_Sold 
 1 992 2010-05 1 
 2 118 2010-05 1 
 3 855 2010-05 1 
 4 846 2010-05 1 
 5 989 2010-05 1 
 6 505 2010-05 1 
 7 997 2010-05 1 
 8 983 2010-05 1 
 9 122 2010-05 1 
 10 601 2010-05 1 
 11 845 2010-05 1 

How can I get DISTINCT items then count the total for each month?

asked Dec 15, 2016 at 14:01
0

1 Answer 1

3

count (distinct item_id)


SELECT
 FORMAT_TIMESTAMP('%Y-%m', Sell_Date) AS Sell_Date,
 count(*) Total_Sold,
 count (distinct item_id) as distinct_item_id 
FROM
 `project.dataset.items`
GROUP BY
 FORMAT_TIMESTAMP('%Y-%m', Sell_Date) AS Sell_Date
ORDER BY
 Sell_Date
answered Dec 15, 2016 at 14:10
10
  • 1
    Worked perfectly, thanks. Could you please add an explanation for me and for others who may be curious about what's happening here? Thanks again. Commented Dec 15, 2016 at 14:16
  • 1
    Actually, this one is quiet straight-forward, COUNT(*) count all rows, COUNT(item_id) counts only item_ids that are not null, COUNT(DISTINCT item_id) counts item_ids that are not, but ignore duplicates of item_id. count (distinct x) where x is 10,20,10,20,20,10,30,20,10,20,30,10 will return 3 (10,20,30). Commented Dec 15, 2016 at 14:22
  • 1
    @MguerraTorres, I didn't work with google-bigquery and there is a good chance that you are right. Changed it. Thanks. Commented Dec 15, 2016 at 14:24
  • 1
    @MguerraTorres correct, BigQuery allows aliases defined in the SELECT list to be used in GROUP BY. Commented Dec 15, 2016 at 14:45
  • 1
    @DuduMarkovitz I don't know that. I guess a simple test with 2 rows having same year and month but different day/time would reveal what happens, by yielding 1 or 2 rows respectively. I don't have a bigquery installation to test. Commented Dec 15, 2016 at 14:50

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.