1

Desired outcome:

+---------+-----------------------------+
| ID PR | Related Repeating Event(s) |
+---------+-----------------------------+
| 1658503 | 1615764;1639329 |
+---------+-----------------------------+ 

Is there a way to write a query in sql / databricks without using a User-defined aggregate functions (UDAFs)? I've tried concat(), GROUP_CONCAT(), LISTAGG but none of these work or are not supported in databricks ("This function is neither a registered temporary function nor a permanent function registered in the database 'default'.".

I found this User-defined aggregate functions (UDAFs) description in the databricks documentation but don't know how to implement it (https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-functions-udf-aggregate.html#user-defined-aggregate-functions-udafs&language-sql)

Would anybody have a hint for me or a link?

What I have is this basic query:

%sql
SELECT
 pr_id,
 data_field_nm,
 field_value
FROM
 gms_us_mart.txn_pr_addtl_data_detail_trkw_glbl --(18)
WHERE
 pr_id = 1658503
 AND data_field_nm = 'Related Repeating Deviation(s)'

Which gives as output:

+---------+--------------------------------+-------------+
| pr_id | data_field_nm | field_value |
+---------+--------------------------------+-------------+
| 1658503 | Related Repeating Deviation(s) | 1615764 |
| 1658503 | Related Repeating Deviation(s) | 1639329 |
+---------+--------------------------------+-------------+

Correct answer is (thanks to @Alex Ott):

%sql
SELECT
 pr_id AS IDPR,
 concat_ws(';', collect_list(field_value)) AS RelatedRepeatingDeviations
FROM
 gms_us_mart.txn_pr_addtl_data_detail_trkw_glbl
WHERE
 data_field_nm = 'Related Repeating Deviation(s)'
 AND pr_id = 1658503
GROUP BY
 pr_id,
 data_field_nm;

Gives desired outcome:

+---------+-----------------------------+
| IDPR | RelatedRepeatingDeviations |
+---------+-----------------------------+
| 1658503 | 1615764;1639329 |
+---------+-----------------------------+ 
asked Feb 9, 2021 at 16:05
9
  • 1
    not a udaf aspect Commented Feb 9, 2021 at 16:08
  • 1
    can u show code around it all pls, Commented Feb 9, 2021 at 16:09
  • 1
    added basic query for more context Commented Feb 9, 2021 at 16:17
  • so you have n rows and want to make an rev from them. pls show input and expected output Commented Feb 9, 2021 at 16:20
  • I've updated the question with all info I have, see query and its output. Commented Feb 9, 2021 at 16:42

1 Answer 1

1

just use group by with collect_list and concat_ws, like this:

  • get data
from pyspark.sql import Row
df = spark.createDataFrame([Row(**{'pr_id':1658503, 'data_field_nm':'related', 'field_value':1615764}), 
 Row(**{'pr_id':1658503, 'data_field_nm':'related', 'field_value':1639329})])
df.createOrReplaceTempView("abc")
  • and do the query:
%sql
select pr_id, 
 data_field_nm, 
 concat_ws(';', collect_list(field_value)) as combined 
from abc 
group by pr_id, data_field_nm

although this will give you the column with the fixed name (combined)

answered Feb 9, 2021 at 18:05
Sign up to request clarification or add additional context in comments.

4 Comments

thanks @alex, I've updated/edited my question with the adapted answer and got the desired outcome.
yes, bad manuals for a noob like me. All you get as info on the databricks doc site: "collect_list(expr) - Collects and returns a list of non-unique elements." A tiny example would have helped me a lot and saved hours of searching and trying
Yeah, that’s a problem - sql docs are mostly coming from Spark docs themselves. In these cases, sites like sparkbyexamples.com could be helpful
thanks for the spark site - that's exactly what I was looking for :)

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.