I have a file containing one long line:
"name surname" <[email protected]>, 'name surname' <[email protected]>, name surname <[email protected]>, "'name surname'" <[email protected]>, surname, <[email protected]>, name <[email protected]>
Note that that's 6 different forms.
I am splitting each email address into its own line, and saving the results into another file:
import sys
ifile = sys.argv[1]
ofile = sys.argv[2]
with open(ifile) as ifile, open(ofile, "w") as ofile:
addresses = ifile.readline().split(">,")
for n, address in enumerate(addresses):
address = address.replace("'", "")
address = address.replace('"', "")
name, address = address.split("<")
address = "<" + address
if len(name) > 1:
name = name.strip()
name = '"{}" '.format(name)
address = "".join(name + address)
if n < len(addresses) - 1:
ofile.write(address.strip() + ">\n")
else:
ofile.write(address.strip() + "\n")
Feels to me like hackery so am looking for a better solution.
1 Answer 1
Why are you first removing the quotes and then putting them back?
And why are you removing the brackets and them putting them back?
This does the same thing, except change ' to ". It also doesn't handle commas in names, so if you have that it won't work. In that case I'd probably use a regexp.
import sys
ifile = sys.argv[1]
ofile = sys.argv[2]
with open(ifile) as ifile, open(ofile, "w") as ofile:
for address in ifile.readline().split(","):
ofile.write(address.strip() + '\n')
Update:
"surname, name <[email protected]>"
sucks, and that means your format is inconsistent and not parseable without horrid hacks. In that case your code seems OK, although I'd probably do it differently. I would most likely use a regexp to find all cases of commas that are NOT preceded by> and followed by a space to something else, say chr(128) or something like that. I'd then parse the code with my code above, extract the email from withing the brackets, strip all quotes and brackets from the remander, and replace back chr(128) with commas.
And the lastly write that to the outfile.
The difference there is that I don't try to handle a horrid format, I first try to fix the problems. It makes for cleaner code, IMO.
Update 2:
I instead replaced the commas that should be split on, making it simpler, like so:
import sys
ifile = sys.argv[1]
ofile = sys.argv[2]
with open(ifile) as ifile, open(ofile, "w") as ofile:
data = ifile.read()
data = data.replace('>,', '>\xF0')
for line in data.split('\xF0'):
name, email = line.split('<')
email = email.replace('>', '').strip()
name = name.replace('"', '').replace("'", "").strip()
ofile.write('"%s" <%s>\n' % (name, email))
and then I realized I could simplify it even more:
import sys
ifile = sys.argv[1]
ofile = sys.argv[2]
with open(ifile) as ifile, open(ofile, "w") as ofile:
data = ifile.read()
for line in data.split('>,'):
name, email = line.split('<')
email = email.strip()
name = name.replace('"', '').replace("'", "").strip()
ofile.write('"%s" <%s>\n' % (name, email))
And as this point I'm basically doing what you are doing, but much simplified.
-
\$\begingroup\$ I'm sorry for forgetting to include the other things that the code must handle. See my updated question. \$\endgroup\$tshepang– tshepang2011年06月20日 10:58:56 +00:00Commented Jun 20, 2011 at 10:58
-
\$\begingroup\$ I put back the single opening bracket because str.split() removes it from the list elements. \$\endgroup\$tshepang– tshepang2011年06月20日 11:28:41 +00:00Commented Jun 20, 2011 at 11:28
-
\$\begingroup\$ @Tshepang: Updated the answer. \$\endgroup\$Lennart Regebro– Lennart Regebro2011年06月20日 11:34:22 +00:00Commented Jun 20, 2011 at 11:34
-
\$\begingroup\$ I'm not see where he indicates that he needs to parse that terrible version. \$\endgroup\$Winston Ewert– Winston Ewert2011年06月20日 17:23:41 +00:00Commented Jun 20, 2011 at 17:23
-
1\$\begingroup\$ Instead of doing:
name.replace('"', '').replace("'", "").strip()
you should be able to doname.strip("'\" ")
for the same effect, see: docs.python.org/library/…. \$\endgroup\$Kit Sunde– Kit Sunde2011年06月23日 17:45:24 +00:00Commented Jun 23, 2011 at 17:45