string conversion

Question 1

I’ve got a long string object which has been formatted like this

myString = "[name = john, family = candy, age = 72],[ name = jeff, family = Thomson, age = 24]"

of course the string is longer than this. Also i have 3 lists with related names:

Names = []
Families = []
Ages = []

I want to read that string character by character and take the data and append it into appropriate lists. Can anyone help me on this about how to separate the string into variables? The thing I need is something like this:

Names = ["john", "jeff", ...]
Families = ["candy", "Thomson", ...]
Ages = [72, 24, ...]

Question 2

So it is ok to have the whole string in memory at a time?

Question 3

yes, there is no problem for that

Question 4

This can be most easily done using a regex. Basically, construct a regex that extracts the name,family and age from the string and extract the relevant data from the tuples returned to build your lists.

import re
if __name__=='__main__':
 myString = "[name = john adams, family = candy, age = 72],[ name = jeff, family = Thomson, age = 24]"
 answers=re.findall("\\[\\s*name = ([^,]+), family = (\\w+), age = (\\d+)\\]",myString)
 names=[x[0] for x in answers]
 families=[x[1] for x in answers]
 ages=map(int,(x[2] for x in answers))
 print "names: ",names
 print "families: ", families
 print "ages: ", ages

Question 5

Thank for your answer but what if a name has a space between it for example name = "Antoni Red"

Question 6

@user435245: Updated my regex to allow all characters in names except ','. I am still assuming family will not contain spaces, but you can also change that by using the same regex for family.

Question 7

import re
Names = []
Families = []
Ages = []
myString = "[name = john, family = candy, age = 72],[ name = jeff, family = Thomson, age = 24"
myregex = re.compile("name = (?P<name>.*?), family = (?P<family>.*?), age = (?P<age>.*)")
for list_ in myString.split(']'):
 found = re.search(myregex, list_).groupdict()
 Names.append(found['name'])
 Families.append(found['family'])
 Ages.append(int(found['age']))

Question 8

+1 for using named groups instead of relying on the format of the data to not change. In a one off script, maybe it doesn't seem worth the trouble, but it seems that I spend a lot of time maintaining what were supposed to have been one-off scripts.

Question 9

Break the problem down:

Parse the string into lists
Load the lists into your other lists.

You'll have a problem, because the entities between commas aren't nice dictionaries.

Question 10

You should parse that to a list of dictionaries, not three differente lists, co-related only by data order. Like in data = [ {"name": "John", "family": "Candy", "age": 72 }, ...]

One possibility, if you can't change the data source, is to do some naive parsing with string methods like split:

myString = "[name = john, family = candy, age = 72],[ name = jeff, family = Thomson, age = 24]"
data = []
for block in myString.split("]"):
 if not block: break
 block = block.split("[")[1]
 entry_dict = {}
 for part in block.split(","):
 key, value = part.split("=")
 key = key.strip()
 value = value.strip()
 if key == "age": value = int(value)
 entry_dict[key] = value
 data.append (entry_dict)

Or, if you are on python 2.7 (or 3.1) and want a shorter code, you can use a dict generator (you can use generators in other versions as well, just creating alist of tuples and adding a "dict" call) :

myString = "[name = john, family = candy, age = 72],[ name = jeff, family = Thomson, age = 24]"

data = []
for block in myString.split("]"):
 if not block: break
 block = block.split("[")[1]
 entry_dict = {}
 data.append ({(part.split("=")[0].strip(), part.split("=")[1].strip()) for part in block.split(",") })

(in this version did not convert "age" to numbers, though)

MAK 26.7k11 gold badges57 silver badges86 bronze badges · Accepted Answer · 2010-10-31 12:24:34Z

This can be most easily done using a regex. Basically, construct a regex that extracts the name,family and age from the string and extract the relevant data from the tuples returned to build your lists.

import re
if __name__=='__main__':
 myString = "[name = john adams, family = candy, age = 72],[ name = jeff, family = Thomson, age = 24]"
 answers=re.findall("\\[\\s*name = ([^,]+), family = (\\w+), age = (\\d+)\\]",myString)
 names=[x[0] for x in answers]
 families=[x[1] for x in answers]
 ages=map(int,(x[2] for x in answers))
 print "names: ",names
 print "families: ", families
 print "ages: ", ages

Thank for your answer but what if a name has a space between it for example name = "Antoni Red"
@user435245: Updated my regex to allow all characters in names except ','. I am still assuming family will not contain spaces, but you can also change that by using the same regex for family.

CollectivesTM on Stack Overflow

string conversion

4 Answers 4

2 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

4 Answers 4

2 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related