How to parse string composed of n parameter and randomly sorted such as:
{ UserID : 36875; tabName : QuickAndEasy}
{ RecipeID : 1150; UserID : 36716}
{ isFromLabel : 0; UserID : 36716; type : recipe; searchWord : soup}
{ UserID : 36716; tabName : QuickAndEasy}
Ultimately I'm looking to ouput parameters in separate columns for a table.
-
How far have you got? What problems have you encountered?khelwood– khelwood2014年12月04日 07:53:19 +00:00Commented Dec 4, 2014 at 7:53
-
That should be trivial with regular expressions, if you can provide more concrete rules that the regex will need to implement. For example, what kinds of characters are allowed as keys/values? Can there be whitespace in a value? If so, will the value be quoted? If so, can there be escaped quotes in such a value? Etc...Tim Pietzcker– Tim Pietzcker2014年12月04日 07:54:13 +00:00Commented Dec 4, 2014 at 7:54
-
Thanks for reply. Did not get far as I could only get 1 chosen parameter, exclusing others. Keys and values are strings, any characters, up to 15 characters for values. no other rules.mmarboeuf– mmarboeuf2014年12月04日 08:09:25 +00:00Commented Dec 4, 2014 at 8:09
-
2Is it a valid json format ? Can you use json.loads() ?Baruch Oxman– Baruch Oxman2014年12月04日 08:21:36 +00:00Commented Dec 4, 2014 at 8:21
2 Answers 2
The regex ([^{}\s:]+)\s*:\s*([^{}\s;]+) works on your examples. You need to be aware, though, that all the matches will be strings, so if you want to store 36875 as a number, you'll need to do some additional processing.
import re
regex = re.compile(
r"""( # Match and capture in group 1:
[^{}\s:]+ # One or more characters except braces, whitespace or :
) # End of group 1
\s*:\s* # Match a colon, optionally surrounded by whitespace
( # Match and capture in group 2:
[^{}\s;]+ # One or more characters except braces, whitespace or ;
) # End of group 2""",
re.VERBOSE)
You can then do
>>> dict(regex.findall("{ isFromLabel : 0; UserID : 36716; type : recipe; searchWord : soup}"))
{'UserID': '36716', 'isFromLabel': '0', 'searchWord': 'soup', 'type': 'recipe'}
Test it live on regex101.com.
answered Dec 4, 2014 at 8:39
Tim Pietzcker
338k59 gold badges521 silver badges572 bronze badges
Sign up to request clarification or add additional context in comments.
1 Comment
mmarboeuf
Thanks! I really need to get up and running with regex. It was a great excercise to figure out your code.
lines = "{ UserID : 36875; tabName : QuickAndEasy } ", \
"{ RecipeID : 1150; UserID : 36716}", \
"{ isFromLabel : 0; UserID : 36716; type : recipe; searchWord : soup}" , \
"{ UserID : 36716; tabName : QuickAndEasy}"
counter = 0
mappedLines = {}
for line in lines:
counter = counter + 1
lineDict = {}
line = line.replace("{","")
line = line.replace("}","")
line = line.strip()
fieldPairs = line.split(";")
for pair in fieldPairs:
fields = pair.split(":")
key = fields[0].strip()
value = fields[1].strip()
lineDict[key] = value
mappedLines[counter] = lineDict
def printField(key, lineSets, comma_desired = True):
if key in lineSets:
print(lineSets[key],end="")
if comma_desired:
print(",",end="")
else:
print()
for key in range(1,len(mappedLines) + 1):
lineSets = mappedLines[key]
printField("UserID",lineSets)
printField("tabName",lineSets)
printField("RecipeID",lineSets)
printField("type",lineSets)
printField("searchWord",lineSets)
printField("isFromLabel",lineSets,False)
CSV output:
36875,QuickAndEasy,,,,
36716,,1150,,,
36716,,,recipe,soup,0
36716,QuickAndEasy,,,,
The code above was Python 3.4. You can get similar output with 2.7 by replacing the function and the last for loop with this:
def printFields(keys, lineSets):
output_line = ""
for key in keys:
if key in lineSets:
output_line = output_line + lineSets[key] + ","
else:
output_line += ","
print output_line[0:len(output_line) - 1]
fields = ["UserID", "tabName", "RecipeID", "type", "searchWord", "isFromLabel"]
for key in range(1,len(mappedLines) + 1):
lineSets = mappedLines[key]
printFields(fields,lineSets)
answered Dec 4, 2014 at 8:40
Scooter
7,1118 gold badges46 silver badges73 bronze badges
lang-py