How to parse a string in Python

Question 1

How to parse string composed of n parameter and randomly sorted such as:

{ UserID : 36875; tabName : QuickAndEasy}
{ RecipeID : 1150; UserID : 36716}
{ isFromLabel : 0; UserID : 36716; type : recipe; searchWord : soup}
{ UserID : 36716; tabName : QuickAndEasy}

Ultimately I'm looking to ouput parameters in separate columns for a table.

Question 2

How far have you got? What problems have you encountered?

Question 3

That should be trivial with regular expressions, if you can provide more concrete rules that the regex will need to implement. For example, what kinds of characters are allowed as keys/values? Can there be whitespace in a value? If so, will the value be quoted? If so, can there be escaped quotes in such a value? Etc...

Question 4

Thanks for reply. Did not get far as I could only get 1 chosen parameter, exclusing others. Keys and values are strings, any characters, up to 15 characters for values. no other rules.

Question 5

Is it a valid json format ? Can you use json.loads() ?

Question 6

The regex ([^{}\s:]+)\s*:\s*([^{}\s;]+) works on your examples. You need to be aware, though, that all the matches will be strings, so if you want to store 36875 as a number, you'll need to do some additional processing.

import re
regex = re.compile(
 r"""( # Match and capture in group 1:
 [^{}\s:]+ # One or more characters except braces, whitespace or :
 ) # End of group 1
 \s*:\s* # Match a colon, optionally surrounded by whitespace
 ( # Match and capture in group 2:
 [^{}\s;]+ # One or more characters except braces, whitespace or ;
 ) # End of group 2""", 
 re.VERBOSE)

You can then do

>>> dict(regex.findall("{ isFromLabel : 0; UserID : 36716; type : recipe; searchWord : soup}"))
{'UserID': '36716', 'isFromLabel': '0', 'searchWord': 'soup', 'type': 'recipe'}

Test it live on regex101.com.

Question 7

Thanks! I really need to get up and running with regex. It was a great excercise to figure out your code.

Question 8

lines = "{ UserID : 36875; tabName : QuickAndEasy } ", \
 "{ RecipeID : 1150; UserID : 36716}", \
 "{ isFromLabel : 0; UserID : 36716; type : recipe; searchWord : soup}" , \
 "{ UserID : 36716; tabName : QuickAndEasy}"
counter = 0
mappedLines = {}
for line in lines:
 counter = counter + 1
 lineDict = {}
 line = line.replace("{","")
 line = line.replace("}","")
 line = line.strip()
 fieldPairs = line.split(";")
 for pair in fieldPairs:
 fields = pair.split(":")
 key = fields[0].strip()
 value = fields[1].strip()
 lineDict[key] = value
 mappedLines[counter] = lineDict
def printField(key, lineSets, comma_desired = True):
 if key in lineSets:
 print(lineSets[key],end="")
 if comma_desired:
 print(",",end="")
 else:
 print()
for key in range(1,len(mappedLines) + 1):
 lineSets = mappedLines[key]
 printField("UserID",lineSets)
 printField("tabName",lineSets)
 printField("RecipeID",lineSets)
 printField("type",lineSets)
 printField("searchWord",lineSets)
 printField("isFromLabel",lineSets,False)

CSV output:

36875,QuickAndEasy,,,,
36716,,1150,,,
36716,,,recipe,soup,0
36716,QuickAndEasy,,,,

The code above was Python 3.4. You can get similar output with 2.7 by replacing the function and the last for loop with this:

def printFields(keys, lineSets):
 output_line = ""
 for key in keys:
 if key in lineSets:
 output_line = output_line + lineSets[key] + ","
 else:
 output_line += ","
 print output_line[0:len(output_line) - 1]
fields = ["UserID", "tabName", "RecipeID", "type", "searchWord", "isFromLabel"]
for key in range(1,len(mappedLines) + 1):
 lineSets = mappedLines[key]
 printFields(fields,lineSets)

Question 9

Hi, Many thanks for your help. I can't get my head around this though and feel hopeless.

Question 10

Does the code not work for you? If not, what is the error or incorrect output?

Tim Pietzcker 338k59 gold badges521 silver badges572 bronze badges · Accepted Answer · 2014-12-04 08:39:23Z

The regex ([^{}\s:]+)\s*:\s*([^{}\s;]+) works on your examples. You need to be aware, though, that all the matches will be strings, so if you want to store 36875 as a number, you'll need to do some additional processing.

import re
regex = re.compile(
 r"""( # Match and capture in group 1:
 [^{}\s:]+ # One or more characters except braces, whitespace or :
 ) # End of group 1
 \s*:\s* # Match a colon, optionally surrounded by whitespace
 ( # Match and capture in group 2:
 [^{}\s;]+ # One or more characters except braces, whitespace or ;
 ) # End of group 2""", 
 re.VERBOSE)

You can then do

>>> dict(regex.findall("{ isFromLabel : 0; UserID : 36716; type : recipe; searchWord : soup}"))
{'UserID': '36716', 'isFromLabel': '0', 'searchWord': 'soup', 'type': 'recipe'}

Test it live on regex101.com.

Thanks! I really need to get up and running with regex. It was a great excercise to figure out your code.

CollectivesTM on Stack Overflow

How to parse a string in Python

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related