2

I am new to Databricks and was required to implement the snowflake code in Databricks.

The snowflake table, code and output look like below:

table:

id col1 hn
ee1 null 1
ee1 null 2
ee1 test 3
ee1 test 4
ee1 test2 5

Query used:

SELECT ID, FIRST_VALUE(col1) ignore nulls OVER (PARTITION BY ID ORDER BY hn) AS first_value, LAST_VALUE(col1) ignore nulls OVER (PARTITION BY ID ORDER BY hn) AS last_value FROM table

Output:

id first_value last_value
ee1 test test2
ee1 test test2
ee1 test test2
ee1 test test2
ee1 test test2

When I tried the same query in Databricks using Spark SQL, ignore nulls did not work properly.

Can anyone provide the equivalent query for this in Databricks?

Mike Walton
7,4592 gold badges14 silver badges26 bronze badges
asked Oct 12, 2023 at 15:12
1

1 Answer 1

4

The key point is the window frame specification:

SELECT ID, 
 FIRST_VALUE(col1) ignore nulls OVER (PARTITION BY ID ORDER BY hn) AS first_value, 
 LAST_VALUE(col1) ignore nulls OVER (PARTITION BY ID ORDER BY hn 
 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_value 
FROM table;

If not defined explicitly the default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

answered Oct 12, 2023 at 15:20
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.