Newest 'spark-window-function' Questions

1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

36 questions

1 vote

1 answer

70 views

Range between windows function

I am trying to write a util function that gives min, max, sum, mean, first of any column cumulative within a window but I need to make it time aware. Should I use rangebetween of rowsbetween? For ...

Imsa Zulfiqar's user avatar

Imsa Zulfiqar

asked Jul 4, 2025 at 11:22

0 votes

1 answer

74 views

Number of Tasks - for Window function without PARTITION BY statement

As per the documentation (https://docs.databricks.com/en/optimizations/spark-ui-guide/one-spark-task.html) , Window function without PARTITION BY statement results in single task on Spark. Is this ...

user16798185's user avatar

user16798185

asked Aug 4, 2024 at 15:29

1 vote

1 answer

197 views

Spark (Scala): Moving average with Window function

Jelly's user avatar

Jelly

1,426

asked May 5, 2024 at 11:59

2 votes

2 answers

128 views

PySpark: count over a window with reset

I have a PySpark DataFrame which looks like this: df = spark.createDataFrame( data=[ (1, "GERMANY", "20230606", True), (2, "GERMANY", "20230620", ...

jakeis's user avatar

jakeis

asked Feb 24, 2024 at 10:39

0 votes

0 answers

270 views

Parallelizing Spark's Pandas API Operations

Spark's Pandas API allows for Pandas functions to be performed on top of a Spark dataframe that looks and behaves like a Pandas Dataframe. Pandas has functions that Spark does not have implementations ...

Brian Anderson's user avatar

Brian Anderson

asked Dec 2, 2023 at 10:26

0 votes

2 answers

343 views

How to perform average over months using window function with null values in between?

I have a dataframe like below df = spark.createDataFrame( [(1,1,10), (2,1,10), (3,1,None),(4,1,10),(5,1,10),(6,1,20) \ ,(7,1,20), (1,2,10),(2,2,10),(3,2,10),(4,2,20),(5,2,20)], ["Month&...

Ash's user avatar

Ash

asked Nov 29, 2023 at 18:11

0 votes

1 answer

145 views

How to get the other columns values using a window with rangeBetween in Pyspark

I have a table like this. I want to get the product_id of the row which has closet purchase_date (checking all rows before current row) and assign it to a new column (ref_id) for current's value for ...

Neda Ah.'s user avatar

Neda Ah.

asked Nov 3, 2023 at 14:00

2 votes

1 answer

2k views

Window function ignore nulls not working in Databricks

I am new to Databricks and was required to implement the snowflake code in Databricks. The snowflake table, code and output look like below: table: id col1 hn ee1 null 1 ee1 null 2 ee1 test 3 ee1 test ...

VarYaz's user avatar

VarYaz

asked Oct 12, 2023 at 15:12

0 votes

1 answer

31 views

I want to fill in timestamps for a given code based on a window function in pyspark

I have a dataset and it’s output in the picture attached, I want to create 3 new columns called start_time_1, start_time_2, start_time_3 such that I can update the first timestamps of each of the ...

sam1234567890's user avatar

sam1234567890

asked Sep 6, 2023 at 20:59

2 votes

1 answer

408 views

PySpark group by with rolling window

Suppose I have a table with three columns: dt, id and value. df_tmp = spark.createDataFrame([('2023-01-01', 1001, 5), ('2023-01-15', 1001, 3), ...

Abhishek Parab's user avatar

Abhishek Parab

asked Aug 3, 2023 at 20:55

0 votes

1 answer

76 views

ADD end of month column Dynamically to spark Dataframe

I have pyspark Dataframe as follows, I need to add EOM column to all the null values for each id dynamically based on last non null EOM value and it should be continuous. My output dataframe looks ...

code_bug's user avatar

code_bug

asked Jun 2, 2023 at 9:46

2 votes

1 answer

112 views

Spark - Calculating running sum with a threshold

I have a use-case where I need to compute running sum over a partition where the running sum does not exceed a certain threshold. For example: // Input dataset | id | created_on | value | ...

sanketd617's user avatar

sanketd617

asked Apr 21, 2023 at 7:28

0 votes

0 answers

107 views

In pyspark, (or SQL) can I use the value calculated in the previous observation in the current observation. (rowwise calculation) (Like SAS Retain)

I want to be able to consecutively go through a table using the value calculated in the previous row in the current row. It seems a window function could do this. from pyspark.sql import SparkSession ...

Harlan Nelson's user avatar

Harlan Nelson

1,532

asked Apr 7, 2023 at 18:17

0 votes

1 answer

75 views

Spark with scala [closed]

Consider 2 dataframes holiday df and everyday df with 3 columns as below Holiday df: (5 records) Country_code|currency_code| date Gb | gbp | 2022年04月15日 Gb | gbp | ...

Vaibhav Kulkarni's user avatar

Vaibhav Kulkarni

asked Apr 5, 2023 at 4:16

0 votes

1 answer

98 views

I want ntile(3) within ntile(3) as in subdivision within division by ntile()

I want to create a ntile(3) within an ntile(3). I have the following table: Customer Total_amt Digital_amt 1 100 45 2 200 150 3 150 23 4 300 100 5 350 350 6 112 10 7 312 15 8 260 160 9 232 150 10 190 ...

sp22's user avatar

sp22

asked Apr 4, 2023 at 16:35

15 30 50 per page

2 3 Next

CollectivesTM on Stack Overflow

Range between windows function

Number of Tasks - for Window function without PARTITION BY statement

Spark (Scala): Moving average with Window function

PySpark: count over a window with reset

Parallelizing Spark's Pandas API Operations

How to perform average over months using window function with null values in between?

How to get the other columns values using a window with rangeBetween in Pyspark

Window function ignore nulls not working in Databricks

I want to fill in timestamps for a given code based on a window function in pyspark

PySpark group by with rolling window

ADD end of month column Dynamically to spark Dataframe

Spark - Calculating running sum with a threshold

In pyspark, (or SQL) can I use the value calculated in the previous observation in the current observation. (rowwise calculation) (Like SAS Retain)

Spark with scala [closed]

I want ntile(3) within ntile(3) as in subdivision within division by ntile()

Hot Network Questions