Let's say I have a string in the following form:
myString={"name", "age", "address", "contacts", "Email"}
I need to get all the items of myString into a list using python. Here's what I did:
r = re.search("myString=\{\"(.+)\", $\}", line)
if r:
items.append(r.group(1))
print(items)
Here line is the variable that holds the content of my text file.
What change do I have to make to my regex to get all the items in myString?
3 Answers 3
Looks like valid set notation so you could use the ast module to parse it instead:
import ast
mystr = 'myString={"name", "age", "address", "contacts", "Email"}'
tree = ast.parse(mystr)
name = tree.body[0].targets[0].id
values = [val.s for val in tree.body[0].value.elts]
print name, values
# prints: myString ['name', 'age', 'address', 'contacts', 'Email']
EDIT: In light of the actual format of the input file, I would use a regex to parse out the block and then parse the block as above, or as bellow to just strip off the quotes:
import re
block_re = re.compile(r'v_dims=\{(.*?)\}', re.S)
with open("C:\XXXX\nemo\Test.mrk") as f:
doc = f.read()
block = block_re.search(doc)
[s.strip().strip('"') for s in block.group(1).split(',')]
But probably the best way is to combine the two:
import ast
import re
with open("C:\XXXX\nemo\Test.mrk") as f:
doc = f.read()
block_re = re.compile(r'v_dims=\{.*?\}', re.S)
tree = ast.parse(block_re.search(doc).group(0))
print [val.s for val in tree.body[0].value.elts]
# ['name', 'age', 'address', 'contacts', 'Email']
13 Comments
ast.dump(tree) to help visualize the structure.myString and then use this code to parse out the values. The nice thing about doing it this way is that it will always work, even if there are escaped quotes inside the quoted strings, or if there are commas inside the quoted strings.mystr = """myString={"name", "age", "address", "contacts", "Email"}"""
print mystr.split('"')[1::2]
1 Comment
Is the string guaranteed to have that structure? If so, you can do:
>>> s = 'myString={"name", "age", "address", "contacts", "Email"}'
>>> data = s[s.find('{') + 1:s.rfind('}')]
>>> data
'"name", "age", "address", "contacts", "Email"'
>>> result = [t.strip() for t in data.split(',')]
>>> result
['"name"', ' "age"', '"address"', '"contacts"', '"Email"']
As you can see, we perform the following steps:
- Find the string between the
{and}characters. - Split the string by the comma. This gives a list of strings.
- We then strip any spaces from these strings to get the items.
If you don't want the quotation marks, you can remove the first and last characters from each of the strings in the resulting list above.
$before the closing brace in your regular expression?