I have a bunch of sloppy Geo text in a field.
However the country/ state names or codes are relatively clean.
It'll say Poland 23 or Illinois Remote or Frogballs, Germany or TX, AL, AK.
I have a finite list of country names/ codes ... and US 50 state names, codes.
I'm trying to figure out the best way to convert the "trash STATENAME trash" into a clean state name or country name.
I'm thinking either go the array route STRTOK_TO_ARRAY(location_field)
- which will convert the string to 'word items' in an array. But I'm not sure the best function to extract a matching 'item' within an array. Array_contains()
merely is true/ false. Not "Poland".
Maybe regex is better for this purpose? Something like regexp_like(location_field,country_list|country_list,'i')
. Only issue here is that -- only want to match countries/ states that are a "word" (preceding or trailing space) -- not "AL" for Alabama when it's part of portugAL for instance.
1 Answer 1
Ah okay weird question -- but I figured it out using regex..
regexp_substr(ph.location,
(
select concat('\\b(',listagg(country_code3,'|'),')\\b') from
dictionary.dim_country
)
,1,1,'i')) as country3 from fact_table
Not sure if this is the cleanest but. Leverages regex. Returns what it "finds" from the dictionary. Leverages the | or operator in conjunction with listing of the dictionary words. Addition of a \b and parentheses on either side to force the expressions to be words.