I am new to Databricks and was required to implement the snowflake code in Databricks.
The snowflake table, code and output look like below:
table:
| id | col1 | hn |
|---|---|---|
| ee1 | null | 1 |
| ee1 | null | 2 |
| ee1 | test | 3 |
| ee1 | test | 4 |
| ee1 | test2 | 5 |
Query used:
SELECT ID, FIRST_VALUE(col1) ignore nulls OVER (PARTITION BY ID ORDER BY hn) AS first_value, LAST_VALUE(col1) ignore nulls OVER (PARTITION BY ID ORDER BY hn) AS last_value FROM table
Output:
| id | first_value | last_value |
|---|---|---|
| ee1 | test | test2 |
| ee1 | test | test2 |
| ee1 | test | test2 |
| ee1 | test | test2 |
| ee1 | test | test2 |
When I tried the same query in Databricks using Spark SQL, ignore nulls did not work properly.
Can anyone provide the equivalent query for this in Databricks?
Mike Walton
7,4592 gold badges14 silver badges26 bronze badges
-
To handle null value comparisons, you can refer to stackoverflow.com/questions/70394130/…Karthikeyan Rasipalay Durairaj– Karthikeyan Rasipalay Durairaj2023年10月12日 15:33:35 +00:00Commented Oct 12, 2023 at 15:33
1 Answer 1
The key point is the window frame specification:
SELECT ID,
FIRST_VALUE(col1) ignore nulls OVER (PARTITION BY ID ORDER BY hn) AS first_value,
LAST_VALUE(col1) ignore nulls OVER (PARTITION BY ID ORDER BY hn
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_value
FROM table;
If not defined explicitly the default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
answered Oct 12, 2023 at 15:20
Lukasz Szozda
182k26 gold badges278 silver badges326 bronze badges
Sign up to request clarification or add additional context in comments.
Comments
Explore related questions
See similar questions with these tags.
lang-py