Parsing regex in Python

Question 1

Let's say I have a string in the following form:
myString={"name", "age", "address", "contacts", "Email"}

I need to get all the items of myString into a list using python. Here's what I did:

r = re.search("myString=\{\"(.+)\", $\}", line)
if r:
 items.append(r.group(1)) 
print(items)

Here line is the variable that holds the content of my text file.

What change do I have to make to my regex to get all the items in myString?

Question 2

Why do you have a $ before the closing brace in your regular expression?

Question 3

@PasteBT, could you please elaborate with an e.g? it will be definitely helpful to me. TIA.

Question 4

@Mark Byers, here's what I changed. m = re.search("v_dims=\{\"(.+)\",\}$", line) but no results. Any help?

Question 5

@Nemo Which part you don't get? It just split string by ", then take every second item out

Question 6

possible duplicate of Python regex to parse text file, get the items in list and count the list

Question 7

Looks like valid set notation so you could use the ast module to parse it instead:

import ast
mystr = 'myString={"name", "age", "address", "contacts", "Email"}'
tree = ast.parse(mystr)
name = tree.body[0].targets[0].id
values = [val.s for val in tree.body[0].value.elts]
print name, values
# prints: myString ['name', 'age', 'address', 'contacts', 'Email']

EDIT: In light of the actual format of the input file, I would use a regex to parse out the block and then parse the block as above, or as bellow to just strip off the quotes:

import re
block_re = re.compile(r'v_dims=\{(.*?)\}', re.S)
with open("C:\XXXX\nemo\Test.mrk") as f:
 doc = f.read()
block = block_re.search(doc)
[s.strip().strip('"') for s in block.group(1).split(',')]

But probably the best way is to combine the two:

import ast
import re
with open("C:\XXXX\nemo\Test.mrk") as f:
 doc = f.read()
block_re = re.compile(r'v_dims=\{.*?\}', re.S)
tree = ast.parse(block_re.search(doc).group(0))
print [val.s for val in tree.body[0].value.elts]
# ['name', 'age', 'address', 'contacts', 'Email']

Question 8

+1: python can parse python. You could use ast.dump(tree) to help visualize the structure.

Question 9

It would be nice to do that way. My problem is I need to look for the key word mystr in my text file and then get the count of its values. I even I even posted this earlier stackoverflow.com/questions/11246888/... but no luck/no help. Any idea will be greatly appreciated. TIA

Question 10

@Nemo, you can use a regular expression to find the line that contains myString and then use this code to parse out the values. The nice thing about doing it this way is that it will always work, even if there are escaped quotes inside the quoted strings, or if there are commas inside the quoted strings.

Question 11

Being a novice Python user, I m facing challenges in using regex. How can I fing the line that contains the key word myString in my text file? I would very love to see it work. Please help me out. TIA.

Question 12

@Trevor, I updated my code right below yours. I would appreciate your time in helping me out. TIA.

Question 13

mystr = """myString={"name", "age", "address", "contacts", "Email"}"""
print mystr.split('"')[1::2]

Question 14

I don't understand why people are giving parsing based answers when this straight-forward solution exists?

Question 15

Is the string guaranteed to have that structure? If so, you can do:

>>> s = 'myString={"name", "age", "address", "contacts", "Email"}'
>>> data = s[s.find('{') + 1:s.rfind('}')]
>>> data
'"name", "age", "address", "contacts", "Email"'
>>> result = [t.strip() for t in data.split(',')]
>>> result
['"name"', ' "age"', '"address"', '"contacts"', '"Email"']

As you can see, we perform the following steps:

Find the string between the { and } characters.
Split the string by the comma. This gives a list of strings.
We then strip any spaces from these strings to get the items.

If you don't want the quotation marks, you can remove the first and last characters from each of the strings in the resulting list above.

Question 16

Actually its not guaranteed to have that structure. I m trying to look for the key word myString in my text file which has some values within double quotes. I need to be able to look for the key word myString={ } and put all its values into a List to be able to count the items. I even posted this earlier stackoverflow.com/questions/11246888/… but no luck/no help. Any idea will be greatly appreciated. TIA.

Trevor 9,6082 gold badges27 silver badges26 bronze badges · Accepted Answer · 2012-06-28 21:13:54Z

Looks like valid set notation so you could use the ast module to parse it instead:

import ast
mystr = 'myString={"name", "age", "address", "contacts", "Email"}'
tree = ast.parse(mystr)
name = tree.body[0].targets[0].id
values = [val.s for val in tree.body[0].value.elts]
print name, values
# prints: myString ['name', 'age', 'address', 'contacts', 'Email']

EDIT: In light of the actual format of the input file, I would use a regex to parse out the block and then parse the block as above, or as bellow to just strip off the quotes:

import re
block_re = re.compile(r'v_dims=\{(.*?)\}', re.S)
with open("C:\XXXX\nemo\Test.mrk") as f:
 doc = f.read()
block = block_re.search(doc)
[s.strip().strip('"') for s in block.group(1).split(',')]

But probably the best way is to combine the two:

import ast
import re
with open("C:\XXXX\nemo\Test.mrk") as f:
 doc = f.read()
block_re = re.compile(r'v_dims=\{.*?\}', re.S)
tree = ast.parse(block_re.search(doc).group(0))
print [val.s for val in tree.body[0].value.elts]
# ['name', 'age', 'address', 'contacts', 'Email']

+1: python can parse python. You could use ast.dump(tree) to help visualize the structure.
It would be nice to do that way. My problem is I need to look for the key word mystr in my text file and then get the count of its values. I even I even posted this earlier stackoverflow.com/questions/11246888/... but no luck/no help. Any idea will be greatly appreciated. TIA
@Nemo, you can use a regular expression to find the line that contains myString and then use this code to parse out the values. The nice thing about doing it this way is that it will always work, even if there are escaped quotes inside the quoted strings, or if there are commas inside the quoted strings.
Being a novice Python user, I m facing challenges in using regex. How can I fing the line that contains the key word myString in my text file? I would very love to see it work. Please help me out. TIA.
@Trevor, I updated my code right below yours. I would appreciate your time in helping me out. TIA.

CollectivesTM on Stack Overflow

Parsing regex in Python

3 Answers 3

13 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

3 Answers 3

13 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related