Splitting one line into multiple ones given a separator

Question 1

I have a file containing one long line:

"name surname" <[email protected]>, 'name surname' <[email protected]>, name surname <[email protected]>, "'name surname'" <[email protected]>, surname, <[email protected]>, name <[email protected]>

Note that that's 6 different forms.

I am splitting each email address into its own line, and saving the results into another file:

import sys
ifile = sys.argv[1]
ofile = sys.argv[2]
with open(ifile) as ifile, open(ofile, "w") as ofile:
 addresses = ifile.readline().split(">,")
 for n, address in enumerate(addresses):
 address = address.replace("'", "")
 address = address.replace('"', "")
 name, address = address.split("<")
 address = "<" + address
 if len(name) > 1:
 name = name.strip()
 name = '"{}" '.format(name)
 address = "".join(name + address)
 if n < len(addresses) - 1:
 ofile.write(address.strip() + ">\n")
 else:
 ofile.write(address.strip() + "\n")

Feels to me like hackery so am looking for a better solution.

Question 2

Why are you first removing the quotes and then putting them back?

And why are you removing the brackets and them putting them back?

This does the same thing, except change ' to ". It also doesn't handle commas in names, so if you have that it won't work. In that case I'd probably use a regexp.

import sys
ifile = sys.argv[1]
ofile = sys.argv[2]
with open(ifile) as ifile, open(ofile, "w") as ofile:
 for address in ifile.readline().split(","):
 ofile.write(address.strip() + '\n')

Update:

"surname, name <[email protected]>" sucks, and that means your format is inconsistent and not parseable without horrid hacks. In that case your code seems OK, although I'd probably do it differently. I would most likely use a regexp to find all cases of commas that are NOT preceded by> and followed by a space to something else, say chr(128) or something like that. I'd then parse the code with my code above, extract the email from withing the brackets, strip all quotes and brackets from the remander, and replace back chr(128) with commas.

And the lastly write that to the outfile.

The difference there is that I don't try to handle a horrid format, I first try to fix the problems. It makes for cleaner code, IMO.

Update 2:

I instead replaced the commas that should be split on, making it simpler, like so:

import sys
ifile = sys.argv[1]
ofile = sys.argv[2]
with open(ifile) as ifile, open(ofile, "w") as ofile:
 data = ifile.read()
 data = data.replace('>,', '>\xF0')
 for line in data.split('\xF0'):
 name, email = line.split('<')
 email = email.replace('>', '').strip()
 name = name.replace('"', '').replace("'", "").strip()
 ofile.write('"%s" <%s>\n' % (name, email))

and then I realized I could simplify it even more:

import sys
ifile = sys.argv[1]
ofile = sys.argv[2]
with open(ifile) as ifile, open(ofile, "w") as ofile:
 data = ifile.read()
 for line in data.split('>,'):
 name, email = line.split('<')
 email = email.strip()
 name = name.replace('"', '').replace("'", "").strip()
 ofile.write('"%s" <%s>\n' % (name, email))

And as this point I'm basically doing what you are doing, but much simplified.

Question 3

I'm sorry for forgetting to include the other things that the code must handle. See my updated question.

Question 4

I put back the single opening bracket because str.split() removes it from the list elements.

Question 5

@Tshepang: Updated the answer.

Question 6

I'm not see where he indicates that he needs to parse that terrible version.

Question 7

Instead of doing: name.replace('"', '').replace("'", "").strip() you should be able to do name.strip("'\" ") for the same effect, see: docs.python.org/library/….

Lennart Regebro Lennart Regebro 6065 silver badges15 bronze badges · Answer 1 · 2011-06-20 10:53:10Z

Why are you first removing the quotes and then putting them back?

And why are you removing the brackets and them putting them back?

This does the same thing, except change ' to ". It also doesn't handle commas in names, so if you have that it won't work. In that case I'd probably use a regexp.

import sys
ifile = sys.argv[1]
ofile = sys.argv[2]
with open(ifile) as ifile, open(ofile, "w") as ofile:
 for address in ifile.readline().split(","):
 ofile.write(address.strip() + '\n')

Update:

"surname, name <[email protected]>" sucks, and that means your format is inconsistent and not parseable without horrid hacks. In that case your code seems OK, although I'd probably do it differently. I would most likely use a regexp to find all cases of commas that are NOT preceded by> and followed by a space to something else, say chr(128) or something like that. I'd then parse the code with my code above, extract the email from withing the brackets, strip all quotes and brackets from the remander, and replace back chr(128) with commas.

And the lastly write that to the outfile.

The difference there is that I don't try to handle a horrid format, I first try to fix the problems. It makes for cleaner code, IMO.

Update 2:

I instead replaced the commas that should be split on, making it simpler, like so:

import sys
ifile = sys.argv[1]
ofile = sys.argv[2]
with open(ifile) as ifile, open(ofile, "w") as ofile:
 data = ifile.read()
 data = data.replace('>,', '>\xF0')
 for line in data.split('\xF0'):
 name, email = line.split('<')
 email = email.replace('>', '').strip()
 name = name.replace('"', '').replace("'", "").strip()
 ofile.write('"%s" <%s>\n' % (name, email))

and then I realized I could simplify it even more:

import sys
ifile = sys.argv[1]
ofile = sys.argv[2]
with open(ifile) as ifile, open(ofile, "w") as ofile:
 data = ifile.read()
 for line in data.split('>,'):
 name, email = line.split('<')
 email = email.strip()
 name = name.replace('"', '').replace("'", "").strip()
 ofile.write('"%s" <%s>\n' % (name, email))

And as this point I'm basically doing what you are doing, but much simplified.

I'm sorry for forgetting to include the other things that the code must handle. See my updated question.
I put back the single opening bracket because str.split() removes it from the list elements.
I'm not see where he indicates that he needs to parse that terrible version.
Instead of doing: name.replace('"', '').replace("'", "").strip() you should be able to do name.strip("'\" ") for the same effect, see: docs.python.org/library/….

Stack Exchange Network

Splitting one line into multiple ones given a separator

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Splitting one line into multiple ones given a separator

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions