Unique array values for this string_to_array

Question 1

This is a follow-up to:

Best way to map different JSON keys to same target columns

Based on these sample tables:

data_providers:
id | field_map
--------------
1 | {"segments": "SEGMENT IDS", "full_name": "FULL NAME"}
leads:
id | data_provider_id | email | data
------------------------------------
1 | 201 | hi@hi | {"SEGMENT IDS": "id1,id1,id1,id2,id3", "FULL NAME": "John Doe"}
2 | 201 | xx@xx | {"FULL NAME": "Billy Bob"}
desired output:
data_provider_id | email | full_name | segment
----------------------------------------------
201 | hi@hi | John Doe | id1
201 | hi@hi | John Doe | id2
201 | hi@hi | John Doe | id3
201 | xx@xx | Billy Bob | NULL

I have the following query:

SELECT
 leads.data_provider_id,
 leads.email,
 leads.data->>(p.field_map->>'full_name') AS full_name,
 segment
FROM leads
LEFT OUTER JOIN data_providers p ON p.id = leads.data_provider_id
LEFT JOIN LATERAL unnest(string_to_array(leads.data->>(p.field_map->>'segments'), ',')) AS segment ON true

This query is doing 2 particular things:

its joining on data_providers table to get the field_map column which contains a JSONB mapping if CSV column headers. So something like {"segments": "SEGMENT ID", "full_name": "FULL NAME"}
Within the data JSONB column of leads, there is a key (which I discover through the field map above) that contains a comma separated string of segment_ids (it comes in a CSV and they chose to put 2 values within 1 row). I want to split it so each segment_id gets its own row (and obviously all other columns remain the same on both rows).

I have 2 goals:

If there is an empty string or the key doesn't exist within the map, I want to return the row but just with NULL for the segment_id. I already got this working by changing CROSS JOIN to LEFT JOIN.
I'm trying to remove duplicates in segment ids, so if someone enters 'id1,id1' it should only produce 1 row. I do this because there is a unique index on that column for the materialized view.

I'm currently stuck on #2.

Question 2

I think you will have a better chance of getting answer(s) if you provide a dbfiddle.uk/?rdbms=postgres_14. Most people will just skip to the next question as soon as they realize they will have to create table- and insert- statement's

Question 3

Make it a subquery and throw in DISTINCT:

SELECT l.data_provider_id
 , l.email
 , l.data->>(p.field_map->>'full_name') AS full_name
 , s.segment
FROM leads l
LEFT JOIN data_providers p ON p.id = l.data_provider_id
LEFT JOIN LATERAL (
 SELECT DISTINCT segment
 FROM unnest(string_to_array(l.data->>(p.field_map->>'segment'), ',')) AS segment
 ) s ON true

Your field_map holds the key 'segment', not 'segments', btw.

You could even use this short syntax:

...
LEFT JOIN LATERAL (
 SELECT DISTINCT unnest(string_to_array(l.data->>(p.field_map->>'segment'), ','))
 ) s(segment) ON true

(But the last one might make unsuspecting SQL purists cringe.)

Original order of array elements is not preserved. If you need that, see:

How to preserve the original order of elements in an unnested array?

And use GROUP BY rather than DISTINCT and also aggregate the minimum ordinal position for each group of duplicates.

Question 4

@Tallboy: I would have tested my solution if you had provided a fiddle ...

Question 5

Awesome thank you! I will also provide fiddle if I can't get this working. You are the Postgres demigod

score 2 · Accepted Answer · 2022-03-15 02:27:06Z

Make it a subquery and throw in DISTINCT:

SELECT l.data_provider_id
 , l.email
 , l.data->>(p.field_map->>'full_name') AS full_name
 , s.segment
FROM leads l
LEFT JOIN data_providers p ON p.id = l.data_provider_id
LEFT JOIN LATERAL (
 SELECT DISTINCT segment
 FROM unnest(string_to_array(l.data->>(p.field_map->>'segment'), ',')) AS segment
 ) s ON true

Your field_map holds the key 'segment', not 'segments', btw.

You could even use this short syntax:

...
LEFT JOIN LATERAL (
 SELECT DISTINCT unnest(string_to_array(l.data->>(p.field_map->>'segment'), ','))
 ) s(segment) ON true

(But the last one might make unsuspecting SQL purists cringe.)

Original order of array elements is not preserved. If you need that, see:

How to preserve the original order of elements in an unnested array?

And use GROUP BY rather than DISTINCT and also aggregate the minimum ordinal position for each group of duplicates.

@Tallboy: I would have tested my solution if you had provided a fiddle ...
Awesome thank you! I will also provide fiddle if I can't get this working. You are the Postgres demigod

Stack Exchange Network

Unique array values for this string_to_array

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Unique array values for this string_to_array

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions