2

While aggregating an array, I need to remove empty strings and then combine all adjacent identical values. E.g.:

["","product","product","","product","","product","product","","product","product","","product","","","collection","product","","","product","product","","collection","order","checkout",""]

Should become:

["product","collection","product","collection","order","checkout"]

I have a working query with 4 nested selects:

SELECT array_agg( page_type_unique_pre) FILTER (WHERE page_type_unique_pre != '')
 OVER (ORDER BY event_time) AS page_type_journey_unique
FROM (
 SELECT CASE WHEN lag(last_page_type) OVER (ORDER BY event_time) LIKE '%' || page_type || '%' THEN ''
 ELSE page_type END AS page_type_unique_pre
 , page_type
 , event_time
 FROM (
 SELECT string_agg(page_type, ',') OVER (ORDER BY event_time) AS page_type_journey
 , first_value(page_type) OVER (PARTITION BY last_page_type_partition ORDER BY event_time) AS last_page_type
 , page_type
 , event_time
 FROM (
 SELECT
 sum(CASE WHEN page_type IS NULL OR page_type = '' THEN 0 ELSE 1 END) OVER (ORDER BY event_time) AS last_page_type_partition,
 page_type,
 event_time
 FROM (
 SELECT * FROM tes
 ) a
 ) b
 ) c
 ) d;

See test case in this fiddle.

I'm sure there is a better way to achieve this?

Erwin Brandstetter
186k28 gold badges463 silver badges636 bronze badges
asked Sep 8, 2022 at 13:03
2
  • 1
    Your fiddle starts without arrays. Do you have to use arrays, or can you use normal aggregation of your rows? Commented Sep 8, 2022 at 16:37
  • I can use without arrays. but I would like to check if it's correct, by seeing the array. Commented Sep 11, 2022 at 4:50

2 Answers 2

1

A single subquery should do:

SELECT ARRAY (
 SELECT page_type
 FROM (
 SELECT event_time, page_type
 , lag(page_type, 1, '') OVER (ORDER BY event_time) AS last_page_type
 FROM tes
 WHERE page_type <> ''
 ) sub
 WHERE page_type <> last_page_type
 ORDER BY event_time
 )
 AS page_type_journey_unique;

fiddle

Eliminate null and empty string with WHERE page_type <> '' right away. See:

Then get the previous page_type with the window function lag(), placing '' as default. This way last_page_type can never be null (and the empty string '' does not collide with an existing value, after having just been eliminated). See:

Hence, we can use a plain <> (not the more expensive IS DISTINCT FROM) in the outer SELECT to identify rows with a new page type.

Feed the resulting set to an ARRAY constructor. Simplest and cheapest. See:

answered Sep 10, 2022 at 2:14
1
  • Thank you! You are a life-saver! Commented Sep 11, 2022 at 6:21
0

As mentioned, it seems you can just use a LAG window function, combined with array aggregation.

If you want to see both before and after combining duplicates, you can use the ARRAY_AGG aggregate with and without a FILTER

SELECT
 ARRAY_AGG(tes.page_type ORDER BY tes.event_time)
 FILTER (WHERE tes.page_type <> tes.last_page_type) AS consecutives_combined,
 ARRAY_AGG(tes.page_type ORDER BY tes.event_time) AS all_values
FROM (
 SELECT
 *,
 LAG(page_type, 1, '') OVER (ORDER BY tes.event_time) AS last_page_type
 FROM tes
 WHERE tes.page_type <> ''
) tes;

db<>fiddle

answered Sep 11, 2022 at 5:04

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.