I have a table with a column that has urls. I want to query out a particular url param value from each record. the url param can occur in any position in the url data and the url can contain hashbangs and this param can contain special chars like -, _ and |.
data table column:
url
http://www.url.com?like=hobby&name=tom-_green
http://www.url.com?name=bob|ghost&like=hobby
and I want the query results to be
name
srini
tom-_green
bob|ghost
I tried a query like
Select regexp_extract(url, '(?<=name=)[^&?]*(?:|$&)',2) as name From table_name
I see java exceptions when I run this query. the exceptions are pretty vague and checking if someone can help.
-
Possible duplicate of Extract parameter value from url using regular expressionsPrune– Prune2015年10月16日 17:24:18 +00:00Commented Oct 16, 2015 at 17:24
-
See similar questions stackoverflow.com/questions/1280557/… and stackoverflow.com/questions/25586792/…; I think they cover most -- if not all -- of what you needPrune– Prune2015年10月16日 17:25:05 +00:00Commented Oct 16, 2015 at 17:25
-
hi @Prune I was looking for the query for hadoop and not javascript :) I found the answer.. but thanks for the help!sriiniivas– sriiniivas2015年10月16日 23:01:47 +00:00Commented Oct 16, 2015 at 23:01
-
Right -- but regexp is very similar from one language to another, variations on the UNIX original. I'm glad you got what you needed.Prune– Prune2015年10月16日 23:09:01 +00:00Commented Oct 16, 2015 at 23:09
1 Answer 1
I found another Hive implementation for handling URLs specifically..
Select parse_url(url, 'QUERY', 'name') as name From table_name and this worked :)
ref: parse_url(string urlString, string partToExtract [, string keyToExtract])
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF