This is a follow-up to:
Based on these sample tables:
data_providers:
id | field_map
--------------
1 | {"segments": "SEGMENT IDS", "full_name": "FULL NAME"}
leads:
id | data_provider_id | email | data
------------------------------------
1 | 201 | hi@hi | {"SEGMENT IDS": "id1,id1,id1,id2,id3", "FULL NAME": "John Doe"}
2 | 201 | xx@xx | {"FULL NAME": "Billy Bob"}
desired output:
data_provider_id | email | full_name | segment
----------------------------------------------
201 | hi@hi | John Doe | id1
201 | hi@hi | John Doe | id2
201 | hi@hi | John Doe | id3
201 | xx@xx | Billy Bob | NULL
I have the following query:
SELECT
leads.data_provider_id,
leads.email,
leads.data->>(p.field_map->>'full_name') AS full_name,
segment
FROM leads
LEFT OUTER JOIN data_providers p ON p.id = leads.data_provider_id
LEFT JOIN LATERAL unnest(string_to_array(leads.data->>(p.field_map->>'segments'), ',')) AS segment ON true
This query is doing 2 particular things:
its joining on
data_providers
table to get thefield_map
column which contains a JSONB mapping if CSV column headers. So something like{"segments": "SEGMENT ID", "full_name": "FULL NAME"}
Within the
data
JSONB column ofleads
, there is a key (which I discover through the field map above) that contains a comma separated string of segment_ids (it comes in a CSV and they chose to put 2 values within 1 row). I want to split it so each segment_id gets its own row (and obviously all other columns remain the same on both rows).
I have 2 goals:
If there is an empty string or the key doesn't exist within the map, I want to return the row but just with NULL for the segment_id. I already got this working by changing
CROSS JOIN
toLEFT JOIN
.I'm trying to remove duplicates in segment ids, so if someone enters 'id1,id1' it should only produce 1 row. I do this because there is a unique index on that column for the materialized view.
I'm currently stuck on #2.
-
2I think you will have a better chance of getting answer(s) if you provide a dbfiddle.uk/?rdbms=postgres_14. Most people will just skip to the next question as soon as they realize they will have to create table- and insert- statement'sLennart - Slava Ukraini– Lennart - Slava Ukraini2022年03月14日 19:45:20 +00:00Commented Mar 14, 2022 at 19:45
1 Answer 1
Make it a subquery and throw in DISTINCT
:
SELECT l.data_provider_id
, l.email
, l.data->>(p.field_map->>'full_name') AS full_name
, s.segment
FROM leads l
LEFT JOIN data_providers p ON p.id = l.data_provider_id
LEFT JOIN LATERAL (
SELECT DISTINCT segment
FROM unnest(string_to_array(l.data->>(p.field_map->>'segment'), ',')) AS segment
) s ON true
Your field_map
holds the key 'segment', not 'segments', btw.
You could even use this short syntax:
...
LEFT JOIN LATERAL (
SELECT DISTINCT unnest(string_to_array(l.data->>(p.field_map->>'segment'), ','))
) s(segment) ON true
(But the last one might make unsuspecting SQL purists cringe.)
Original order of array elements is not preserved. If you need that, see:
And use GROUP BY
rather than DISTINCT
and also aggregate the minimum ordinal position for each group of duplicates.
-
1@Tallboy: I would have tested my solution if you had provided a fiddle ...Erwin Brandstetter– Erwin Brandstetter2022年03月15日 02:39:48 +00:00Commented Mar 15, 2022 at 2:39
-
Awesome thank you! I will also provide fiddle if I can't get this working. You are the Postgres demigodTallboy– Tallboy2022年03月15日 18:44:58 +00:00Commented Mar 15, 2022 at 18:44