import pandas as pd
df = pd.read_csv("email_addresses_of_ALL_purchasers.csv")
all_emails = df["Email"]
real_emails = []
test_domains = ['yahoo.com', 'gmail.com', 'facebook.com', 'hotmail.com']
for email in all_emails:
email_separated = email.split("@")
if email_separated[1] not in test_domains:
real_emails.append(email)
print real_emails
I'm trying to filter out different email account types. Why does this above code produce an error:
IndexError: list index out of range
asked Dec 6, 2013 at 1:33
-
"List index out of range" is a pretty self-explanatory error message IMO. Take a look at the actual data that's causing the error.Max Noel– Max Noel2013年12月06日 01:38:47 +00:00Commented Dec 6, 2013 at 1:38
3 Answers 3
Apparently one of your emails does not contain a @.
Put a print(email)
as first statement of the loop, then you can see which email doesn't fit.
answered Dec 6, 2013 at 1:37
Sign up to request clarification or add additional context in comments.
1 Comment
SethMMorton
+1 Before anything else, when debugging you should
print
the variables so that what you expect to be in them is actually in them.Try this:
import pandas as pd
df = pd.read_csv("email_addresses_of_ALL_purchasers.csv")
all_emails = df["Email"]
real_emails = []
test_domains = ['yahoo.com', 'gmail.com', 'facebook.com', 'hotmail.com']
for email in all_emails:
email_separated = email.split("@")
try:
if email_separated[1] not in test_domains:
real_emails.append(email)
except IndexError:
print('Mail {} does not contain a @ sign'.format(email))
print real_emails
answered Dec 6, 2013 at 1:39
1 Comment
DSM
We could use
pandas
directly too: print all_emails[~all_emails.str.contains("@")]
More robust to use partition
here. If the @
is missing - domain
will simply be the empty string
for email in all_emails:
name, delim, domain = email.partition("@")
if domain and domain not in test_domains:
Also wikipedia has a list of unusual but valid email address examples that may surprise you
answered Dec 6, 2013 at 1:45
Comments
lang-py