0

I have a bunch of sloppy Geo text in a field.

However the country/ state names or codes are relatively clean.

It'll say Poland 23 or Illinois Remote or Frogballs, Germany or TX, AL, AK.

I have a finite list of country names/ codes ... and US 50 state names, codes.

I'm trying to figure out the best way to convert the "trash STATENAME trash" into a clean state name or country name.

I'm thinking either go the array route STRTOK_TO_ARRAY(location_field) - which will convert the string to 'word items' in an array. But I'm not sure the best function to extract a matching 'item' within an array. Array_contains() merely is true/ false. Not "Poland".

Maybe regex is better for this purpose? Something like regexp_like(location_field,country_list|country_list,'i'). Only issue here is that -- only want to match countries/ states that are a "word" (preceding or trailing space) -- not "AL" for Alabama when it's part of portugAL for instance.

asked Jun 5, 2023 at 21:32

1 Answer 1

0

Ah okay weird question -- but I figured it out using regex..

regexp_substr(ph.location,
(
 select concat('\\b(',listagg(country_code3,'|'),')\\b') from
 dictionary.dim_country
)
,1,1,'i')) as country3 from fact_table

Not sure if this is the cleanest but. Leverages regex. Returns what it "finds" from the dictionary. Leverages the | or operator in conjunction with listing of the dictionary words. Addition of a \b and parentheses on either side to force the expressions to be words.

answered Jun 5, 2023 at 21:52

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.