1

This is a follow-up to:

Based on these sample tables:

data_providers:
id | field_map
--------------
1 | {"segments": "SEGMENT IDS", "full_name": "FULL NAME"}
leads:
id | data_provider_id | email | data
------------------------------------
1 | 201 | hi@hi | {"SEGMENT IDS": "id1,id1,id1,id2,id3", "FULL NAME": "John Doe"}
2 | 201 | xx@xx | {"FULL NAME": "Billy Bob"}
desired output:
data_provider_id | email | full_name | segment
----------------------------------------------
201 | hi@hi | John Doe | id1
201 | hi@hi | John Doe | id2
201 | hi@hi | John Doe | id3
201 | xx@xx | Billy Bob | NULL

I have the following query:

SELECT
 leads.data_provider_id,
 leads.email,
 leads.data->>(p.field_map->>'full_name') AS full_name,
 segment
FROM leads
LEFT OUTER JOIN data_providers p ON p.id = leads.data_provider_id
LEFT JOIN LATERAL unnest(string_to_array(leads.data->>(p.field_map->>'segments'), ',')) AS segment ON true

This query is doing 2 particular things:

  1. its joining on data_providers table to get the field_map column which contains a JSONB mapping if CSV column headers. So something like {"segments": "SEGMENT ID", "full_name": "FULL NAME"}

  2. Within the data JSONB column of leads, there is a key (which I discover through the field map above) that contains a comma separated string of segment_ids (it comes in a CSV and they chose to put 2 values within 1 row). I want to split it so each segment_id gets its own row (and obviously all other columns remain the same on both rows).

I have 2 goals:

  1. If there is an empty string or the key doesn't exist within the map, I want to return the row but just with NULL for the segment_id. I already got this working by changing CROSS JOIN to LEFT JOIN.

  2. I'm trying to remove duplicates in segment ids, so if someone enters 'id1,id1' it should only produce 1 row. I do this because there is a unique index on that column for the materialized view.

I'm currently stuck on #2.

asked Mar 14, 2022 at 18:17
1
  • 2
    I think you will have a better chance of getting answer(s) if you provide a dbfiddle.uk/?rdbms=postgres_14. Most people will just skip to the next question as soon as they realize they will have to create table- and insert- statement's Commented Mar 14, 2022 at 19:45

1 Answer 1

2

Make it a subquery and throw in DISTINCT:

SELECT l.data_provider_id
 , l.email
 , l.data->>(p.field_map->>'full_name') AS full_name
 , s.segment
FROM leads l
LEFT JOIN data_providers p ON p.id = l.data_provider_id
LEFT JOIN LATERAL (
 SELECT DISTINCT segment
 FROM unnest(string_to_array(l.data->>(p.field_map->>'segment'), ',')) AS segment
 ) s ON true

Your field_map holds the key 'segment', not 'segments', btw.

You could even use this short syntax:

...
LEFT JOIN LATERAL (
 SELECT DISTINCT unnest(string_to_array(l.data->>(p.field_map->>'segment'), ','))
 ) s(segment) ON true

(But the last one might make unsuspecting SQL purists cringe.)

Original order of array elements is not preserved. If you need that, see:

And use GROUP BY rather than DISTINCT and also aggregate the minimum ordinal position for each group of duplicates.

answered Mar 15, 2022 at 2:27
2
  • 1
    @Tallboy: I would have tested my solution if you had provided a fiddle ... Commented Mar 15, 2022 at 2:39
  • Awesome thank you! I will also provide fiddle if I can't get this working. You are the Postgres demigod Commented Mar 15, 2022 at 18:44

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.