Skip to main content
Stack Overflow
  1. About
  2. For Teams
Filter by
Sorted by
Tagged with
1 vote
1 answer
70 views

I am trying to write a util function that gives min, max, sum, mean, first of any column cumulative within a window but I need to make it time aware. Should I use rangebetween of rowsbetween? For ...
0 votes
1 answer
74 views

As per the documentation (https://docs.databricks.com/en/optimizations/spark-ui-guide/one-spark-task.html) , Window function without PARTITION BY statement results in single task on Spark. Is this ...
1 vote
1 answer
197 views

The input dataframe it looks like this: +---+----------+----------+--------+-----+-------------------+ | id|product_id|sales_date|quantity|price| timestampCol| +---+----------+----------+--------...
2 votes
2 answers
128 views

I have a PySpark DataFrame which looks like this: df = spark.createDataFrame( data=[ (1, "GERMANY", "20230606", True), (2, "GERMANY", "20230620", ...
0 votes
0 answers
270 views

Spark's Pandas API allows for Pandas functions to be performed on top of a Spark dataframe that looks and behaves like a Pandas Dataframe. Pandas has functions that Spark does not have implementations ...
0 votes
2 answers
343 views

I have a dataframe like below df = spark.createDataFrame( [(1,1,10), (2,1,10), (3,1,None),(4,1,10),(5,1,10),(6,1,20) \ ,(7,1,20), (1,2,10),(2,2,10),(3,2,10),(4,2,20),(5,2,20)], ["Month&...
0 votes
1 answer
145 views

I have a table like this. I want to get the product_id of the row which has closet purchase_date (checking all rows before current row) and assign it to a new column (ref_id) for current's value for ...
2 votes
1 answer
2k views

I am new to Databricks and was required to implement the snowflake code in Databricks. The snowflake table, code and output look like below: table: id col1 hn ee1 null 1 ee1 null 2 ee1 test 3 ee1 test ...
0 votes
1 answer
31 views

I have a dataset and it’s output in the picture attached, I want to create 3 new columns called start_time_1, start_time_2, start_time_3 such that I can update the first timestamps of each of the ...
2 votes
1 answer
408 views

Suppose I have a table with three columns: dt, id and value. df_tmp = spark.createDataFrame([('2023-01-01', 1001, 5), ('2023-01-15', 1001, 3), ...
0 votes
1 answer
76 views

I have pyspark Dataframe as follows, I need to add EOM column to all the null values for each id dynamically based on last non null EOM value and it should be continuous. My output dataframe looks ...
2 votes
1 answer
112 views

I have a use-case where I need to compute running sum over a partition where the running sum does not exceed a certain threshold. For example: // Input dataset | id | created_on | value | ...
0 votes
0 answers
107 views

I want to be able to consecutively go through a table using the value calculated in the previous row in the current row. It seems a window function could do this. from pyspark.sql import SparkSession ...
0 votes
1 answer
75 views

Consider 2 dataframes holiday df and everyday df with 3 columns as below Holiday df: (5 records) Country_code|currency_code| date Gb | gbp | 2022年04月15日 Gb | gbp | ...
0 votes
1 answer
98 views

I want to create a ntile(3) within an ntile(3). I have the following table: Customer Total_amt Digital_amt 1 100 45 2 200 150 3 150 23 4 300 100 5 350 350 6 112 10 7 312 15 8 260 160 9 232 150 10 190 ...

15 30 50 per page
1
2 3

AltStyle によって変換されたページ (->オリジナル) /