While aggregating an array, I need to remove empty strings and then combine all adjacent identical values. E.g.:
["","product","product","","product","","product","product","","product","product","","product","","","collection","product","","","product","product","","collection","order","checkout",""]
Should become:
["product","collection","product","collection","order","checkout"]
I have a working query with 4 nested selects:
SELECT array_agg( page_type_unique_pre) FILTER (WHERE page_type_unique_pre != '')
OVER (ORDER BY event_time) AS page_type_journey_unique
FROM (
SELECT CASE WHEN lag(last_page_type) OVER (ORDER BY event_time) LIKE '%' || page_type || '%' THEN ''
ELSE page_type END AS page_type_unique_pre
, page_type
, event_time
FROM (
SELECT string_agg(page_type, ',') OVER (ORDER BY event_time) AS page_type_journey
, first_value(page_type) OVER (PARTITION BY last_page_type_partition ORDER BY event_time) AS last_page_type
, page_type
, event_time
FROM (
SELECT
sum(CASE WHEN page_type IS NULL OR page_type = '' THEN 0 ELSE 1 END) OVER (ORDER BY event_time) AS last_page_type_partition,
page_type,
event_time
FROM (
SELECT * FROM tes
) a
) b
) c
) d;
See test case in this fiddle.
I'm sure there is a better way to achieve this?
-
1Your fiddle starts without arrays. Do you have to use arrays, or can you use normal aggregation of your rows?Charlieface– Charlieface2022年09月08日 16:37:35 +00:00Commented Sep 8, 2022 at 16:37
-
I can use without arrays. but I would like to check if it's correct, by seeing the array.AJ AJ– AJ AJ2022年09月11日 04:50:55 +00:00Commented Sep 11, 2022 at 4:50
2 Answers 2
A single subquery should do:
SELECT ARRAY (
SELECT page_type
FROM (
SELECT event_time, page_type
, lag(page_type, 1, '') OVER (ORDER BY event_time) AS last_page_type
FROM tes
WHERE page_type <> ''
) sub
WHERE page_type <> last_page_type
ORDER BY event_time
)
AS page_type_journey_unique;
Eliminate null and empty string with WHERE page_type <> ''
right away. See:
Then get the previous page_type
with the window function lag()
, placing ''
as default. This way last_page_type
can never be null
(and the empty string ''
does not collide with an existing value, after having just been eliminated). See:
Hence, we can use a plain <>
(not the more expensive IS DISTINCT FROM
) in the outer SELECT
to identify rows with a new page type.
Feed the resulting set to an ARRAY constructor. Simplest and cheapest. See:
-
Thank you! You are a life-saver!AJ AJ– AJ AJ2022年09月11日 06:21:25 +00:00Commented Sep 11, 2022 at 6:21
As mentioned, it seems you can just use a LAG
window function, combined with array aggregation.
If you want to see both before and after combining duplicates, you can use the ARRAY_AGG
aggregate with and without a FILTER
SELECT
ARRAY_AGG(tes.page_type ORDER BY tes.event_time)
FILTER (WHERE tes.page_type <> tes.last_page_type) AS consecutives_combined,
ARRAY_AGG(tes.page_type ORDER BY tes.event_time) AS all_values
FROM (
SELECT
*,
LAG(page_type, 1, '') OVER (ORDER BY tes.event_time) AS last_page_type
FROM tes
WHERE tes.page_type <> ''
) tes;
Explore related questions
See similar questions with these tags.