3
\$\begingroup\$

The following SQL code keeps only the MAX(date) rows with the same id and question values. I would like to know if there is a simpler/ shorter syntax returning the same result.

with 
tbl_src as (select * from `tests2.o1.mc` order by id, date),
tbl_max_date as (
 select
 id,
 question,
 MAX(date) as max_date
 from
 `tests2.o1.mc`
 group by
 id,
 question
)
select 
 tbl_src.*
from
 tbl_src
inner join
 tbl_max_date
on
 tbl_src.id = tbl_max_date.id
 and tbl_src.question = tbl_max_date.question
 and tbl_src.date = tbl_max_date.max_date

The original data:

id date question answers
1 2018年03月21日 q1 "[""n1"",""n3""]"
1 2018年12月10日 q1 "[""n1"",""n2"",""n3""]"
1 2018年03月21日 q2 "[""N1"",""n3""]"
1 2018年12月10日 q2 "[""n1"",""n3""]"
1 2018年03月21日 q3 "[""N1""]"
1 2018年12月10日 q3 "[""n2""]"
2 2018年03月29日 q1 "[""n1"",""n3""]"
2 2018年06月01日 q1 "[""n1"",""n2"",""n3""]"
2 2018年06月02日 q1 "[""n1"",""n3""]"
2 2018年06月01日 q2 "[""n1"",""N2""]"
2 2018年06月01日 q3 "[""n3""]"
3 2018年03月14日 q1 "[""n2"",""n3""]"
3 2018年03月26日 q2 "[""n1""]"
3 2018年03月14日 q3 "[""n3""]"

The result:

id date question answers
1 2018年12月10日 q1 "[""n1"",""n2"",""n3""]"
1 2018年12月10日 q2 "[""n1"",""n3""]"
1 2018年12月10日 q3 "[""n2""]"
2 2018年06月02日 q1 "[""n1"",""n3""]"
2 2018年06月01日 q2 "[""n1"",""N2""]"
2 2018年06月01日 q3 "[""n3""]"
3 2018年03月14日 q1 "[""n2"",""n3""]"
3 2018年03月26日 q2 "[""n1""]"
3 2018年03月14日 q3 "[""n3""]"
H. Pauwelyn
3511 gold badge2 silver badges18 bronze badges
asked Dec 15, 2020 at 21:00
\$\endgroup\$

2 Answers 2

2
\$\begingroup\$

You can use ROW_NUMBER to rank your data according to date for each combination of id and question; then simply select the row with a ROW_NUMBER of 1:

WITH tbl_max_date AS (
 SELECT *,
 ROW_NUMBER() OVER (PARTITION BY id, question ORDER BY date DESC) AS rn
 FROM tests2.o1.mc
)
SELECT *
FROM tbl_max_date
WHERE rn = 1

If you could have more than one row with the same maximum value per group, you can use RANK in place of ROW_NUMBER, as that will give all rows with the same value the same ranking. For example:

WITH tbl_max_date AS (
 SELECT *,
 RANK() OVER (PARTITION BY id, question ORDER BY date DESC) aS rn
 FROM tbl_src
)
SELECT *
FROM tbl_max_date
WHERE rn = 1
answered Dec 16, 2020 at 10:35
\$\endgroup\$
0
1
\$\begingroup\$

I can't speak for Google BigQuery, but in other databases common table expressions impose an optimization boundary and subqueries can perform better; so consider dropping your with.

Is the only purpose of tbl_src to do an order by? It seems so. It's in somewhat of a backwards place, because order by can only be guaranteed to be preserved at the outer level of a query and not after a join, and anything else that works is "by accident".

Try the following:

select *
from (
 select id, question, answers, max(date) as max_date
 from `tests2.o1.mc`
 group by id, question, answers
)
order by id, max_date
answered Dec 15, 2020 at 21:16
\$\endgroup\$
4
  • \$\begingroup\$ Thank you for extremely valuable comments! However the proposed query does not return the last field (answers). The reason why I first grouped and then joined was to return more fields than just id, question and max_date. About ordering, you are correct. One question from my side: when would it be appropriate to use with? Is it cases when I would use the same query more than once? \$\endgroup\$ Commented Dec 16, 2020 at 7:44
  • \$\begingroup\$ I've edited to include answers, which simply needs to be in the group by. Regarding with: the answer is basically "when you can't subquery", which - yes - includes the case where the subquery needs to be reused. \$\endgroup\$ Commented Dec 16, 2020 at 16:15
  • \$\begingroup\$ Some answers in id-question groups differ, so instead of 9 records I get 13. I guess I should use the previous version + a join. \$\endgroup\$ Commented Dec 16, 2020 at 19:01
  • \$\begingroup\$ Let's discuss in chat.stackexchange.com/rooms/117358/… \$\endgroup\$ Commented Dec 16, 2020 at 19:44

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.