The main objective of this bot to make fast and reliable automatic subscriptions to email newsletter for people, who filled the website form(submissions go to google sheet)
I had too many requests to google docs.(rate limit error) -> so i added sleep 5 after every subscription. Also added break into email-check loop.(to avoid subscribing too many in 1 loop)
# pylint: disable=missing-module-docstring
import quickemailverification
import json
import sentry_sdk
import re
import requests
import traceback
from bs4 import BeautifulSoup
from mod import *
import gspread
import urllib3
from os import system
import time
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
client = quickemailverification.Client(
'eed14e5a') # key blurred
sentry_sdk.init("https://322.ingest.sentry.io/334", # key blurred
traces_sample_rate=1.0)
def regex(regex, text):
'''If find this text-return True'''
return bool(re.search(regex, text))
def checkemail(email):
'''Api for email deliverability True-means many metrics are fine'''
quickemailverification = client.quickemailverification()
response = quickemailverification.verify(email)
if response.body["safe_to_send"] == "true":
return True
else:
return False
# pylint: disable=missing-module-docstring
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
# For clarity in opened windows cmd(i just run .py file and they all ahve same title)
system("Ben EMail")
r = requests
def subscribe(email, name="", proxy="", debug=False):
if debug:
proxy = "http://127.0.0.1:8080"
proxies = {
"https": proxy
}
burp0_url = "https://api.benchmarkemail.com"
burp0_cookies = {
"Key": "M1oqxE5kauGh55V12W2J" # key blurred
}
burp0_headers = {}
burp0_data = {
"doubleoptin": "1",
"fldEmail": email,
"fldfirstname": name,
"fldlastname": ''
}
r = requests.post(burp0_url,
headers=burp0_headers,
cookies=burp0_cookies,
data=burp0_data,
verify=False,
proxies=proxies,
timeout=10)
if regex("Check your inbox", r.text):
# check response html text
print("Subscribed", name, email, r.status_code)
success = True
else:
print("Error", name, email, "status code", r.status_code)
success = False
return success, r.text
credentials = {
"type": "service_account",
# key blurred
}
gc = gspread.service_account_from_dict(credentials)
sh = gc.open("email_form")
ws = sh.get_worksheet(0)
def parse_bmresponse(response):
soup = BeautifulSoup(response, 'html.parser')
result = soup.find("div", {"class": "content-section"})
# apart from parsing response for "success phrase" i save api responses for further investigations
return result.get_text()
def dotable(): # go through the whole table and find who is not subscribed yet
all_values = ws.get_all_values()
rownum = 1
for row in all_values[1:]:
rownum += 1
if row[2] == "subscribed" or row[2] == "error":
pass
elif row[2] == "":
name = row[0]
email = row[1]
print(fr"Empty status {name},{email} on {rownum} ")
success, response = subscribe(email, name)
if success:
# Change status if subbed
ws.update("C" + str(rownum), 'moded_ok')
# response from email delivery api
check_result = "Can send "+str(checkemail(email))
# raw result from subsription func
ws.update("F" + str(rownum), check_result)
# response from email delivery api
ws.update("G" + str(rownum), parse_bmresponse(response))
break
else:
ws.update("C" + str(rownum), 'moded_noluck')
ws.update("F" + str(rownum), check_result)
ws.update("G" + str(rownum), parse_bmresponse(response))
break
else:
pass
while 1 > 0:
try:
dotable()
time.sleep(5)
except requests.exceptions.ReadTimeout:
print("Timeout server, i sleep 15")
time.sleep(15)
except Exception:
traceback.print_exc()
sentry_sdk.capture_exception()
1 Answer 1
Exceptions
Using except Exception
to ignore exceptions without knowing the exact causes is often not a great idea. It makes it hard for someone unfamiliar with the code to tell what kinds of exceptions are expected, and also means that when things go wrong in unexpected ways the program doesn't stop doing things but instead keeps doing the wrong thing, which is often harmless, but can get pretty nasty at times
It also ends up capturing KeyboardInterrupt
, which is probably not very important, but still a bit annoying
Credit for using an exception handling framework and taking care to log errors, though
Imports
The import json
and from mod import *
lines don't seem necessary - as far as I can tell those never get used
Globals
Global variables tend to make program state a bit harder to reason about, and make the functions that use them a bit less flexible. Consider passing values like ws
and client
as parameters to the functions that use them instead
regex
Helper functions are nice, but this one in particular seems redundant. It's only used in one place, and if re.search("Check your inbox", r.text)
should work fine there. Actually, since that's just a plain string and not a regex at all, if "Check your inbox" in r.text
should work just as well and probably perform slightly better
Naming
Conventionally, words are split using underscores like_so
. Names like checkemail
, parse_bmresponse
and dotable
would usually be written like check_email
, parse_bm_response
and do_table
That said, those names aren't as descriptive as I think they could be, though:
- Checking an email does describe what
check_email
does - but it might be clearer to say what it checks. A name likecan_send_to
, oris_reachable_email
seems like a good fit - "Do something to a table" feels like a less clear summary of what that function does than "update the subscriptions", so a name like
add_subscriptions
might seem a bit more appropriate. It might not be the best name, sure, but I do prefer it overdo_table
- The name
parse_bmresponse
could be clearer if it stated how it parsed it and what data it extracts. Something likeget_response_text
orget_text_content
seems like a good summary to me
And speaking of clarity, I appreciate the docstrings that are present, but it'd be nice if the rest of the functions also had some
checkemail
I did mention passing client
as a parameter, but do we need to create a new Quickemailverification
each time, or might it make more sense to pass that as a parameter instead? I haven't used the quickemailverification
library before, and their docs website seems to be down right now, so I can't check
Also, when there's an if
that only serves to determine whether to return True
or return False
, you usually don't need the if
- the if
/return
s can be simplified to return response.body["safe_to_send"] == "true"
, since that expression is already a bool
All in all, you might be able to get something close to:
def can_send_to(email, verifier):
'''Api for email deliverability True-means many metrics are fine'''
response = verifier.verify(email)
return response.body["safe_to_send"] == "true"
dotable
One issue here is that each call to Workspace.update
sends a separate request. We can avoid that by using Workspace.batch_update
instead
We might want to consider limiting ourselves to only fetching part of the worksheet by passing a range (like "A1:G"
, or even a limited number of rows like "A1:G500"
) to Workspace.get_values
When iterating over a list, if you find yourself wanting both the content and the position, the right tool for the job is the builtin enumerate
, as in for index, row in enumerate(rows):
. There is no need to keep track of an index manually
Why does the program check separately for the statuses subscribed
and error
? It doesn't do anything special in those cases
On a related note, else: pass
is never necessary. That if
/elif
/else
branch can be trimmed down to only an if row[2] == "":
block
Where would the values subscribed
or error
come from, anyway? This program uses the values moded_ok
and moded_noluck
for that column, doesn't it? Are there multiple programs adding subscriptions?
If subscribing fails, we try to update column F with the value of the variable check_result
- which hasn't been defined at that point, causing the entire function to fail with a NameError
. That's not ideal
Since we break
after attempting to subscribe once, we only ever handle one subscription each time we fetch the sheet. This feels inefficient - is there a reason not to do several at once? That could let us fetch the spreadsheet less often, and would reduce the risk of falling behind if there's a lot of new subscribers, right?
All in all, I might've done something closer to:
def add_subscriptions(worksheet, email_verifier):
"""
Attempt to add subscriptions for the people in the worksheet that haven't previously been handled
"""
sheet_range = "A1:G"
values = worksheet.get_values(sheet_range)
for index, row in enumerate(values):
if row[2] == "":
name = row[0]
email = row[1]
print(fr"Empty status {name},{email} on {index + 1} ")
success, response = subscribe(email, name)
if success:
row[2] = 'moded_ok'
row[5] = "Can send " + str(can_send_to(email, email_verifier))
else:
row[2] = 'moded_noluck'
row[5] = "Subscription failed"
row[6] = get_response_text(response)
worksheet.batch_update([{
"range": sheet_range,
"values": values
}])
Alternatively, I might try to use the Cell
class with Workspace.range
and Workspace.update_cells
, which might look more like this:
def add_subscriptions(worksheet, email_verifier):
"""
Attempt to add subscriptions for the people in the worksheet that haven't previously been handled
"""
row_count = worksheet.row_count
all_cells = worksheet.range(f"A1:G{row_count}")
rows = (tuple(v) for _, v in itertools.groupby(all_cells, Cell.row.fget))
updated_cells = []
for row_number, row in enumerate(rows):
if row[2].value == "":
name = row[0].value
email = row[1].value
print(fr"Empty status {name},{email} on {row_number} ")
success, response = subscribe(email, name)
if success:
row[2].value = 'moded_ok'
row[5].value = "Can send " + str(can_send_to(email, email_verifier))
else:
row[2].value = 'moded_noluck'
row[5].value = "Subscription failed"
row[6].value = get_response_text(response)
updated_cells += (row[2], row[5], row[6])
if updated_cells:
worksheet.update_cells(updated_cells)
Top-level code
To allow a module to be import
ed, it's usually a good idea to put top-level code in an if __name__ == "__main__"
block. It's not always important, but it's a good habit to get into, and it also means the code is bundled into a single block instead of possibly being scattered all throughout the file
On a related note, InsecureRequestWarning
s get disabled in two different places. Once should be enough, since they don't get re-enabled inbetween
Side note, those warnings are there for a reason - verifying certificates is usually a good idea. Is there a reason you don't want to do that?
I don't think system("Ben EMail")
will set the window title - is it meant to be system("title Ben EMail")
?
The r = requests
line doesn't seem necessary - that value never ends up used since you spell out requests
when actually using it, and the name r
does get used for other variables in other places, so this just winds up confusing
while 1 > 0
works, but is a bit unusual. while True
is the usual way to write an infinite loop
All in all, I'd suggest gathering the rest of the non-function non-import code into something a bit like:
if __name__ == "__main__":
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
client = quickemailverification.Client(
'eed14e5a') # key blurred
qev = client.quickemailverification()
#pylint: disable=E0110
sentry_sdk.init("https://322.ingest.sentry.io/334", # key blurred
traces_sample_rate=1.0)
# Set console window title
system("title Ben EMail")
credentials = {
"type": "service_account",
# key blurred
}
gc = gspread.service_account_from_dict(credentials)
sh = gc.open("email_form")
ws = sh.get_worksheet(0)
while True:
try:
add_subscriptions(ws, qev)
time.sleep(5)
except requests.exceptions.ReadTimeout:
print("Timeout server, i sleep 15")
time.sleep(15)
except Exception:
traceback.print_exc()
sentry_sdk.capture_exception()
raise
-
\$\begingroup\$ Firstly Sara, thank you so much for taking time and writing such a detailed reply for me! I really appreciate that and hope to be good enough one day to help people as much as you helped me to understand ways to improve. Question. 1) Could you please explain why you do (i mean is there a more intuitive way?) row = all_values[row_number - 1] It took me a while to understand the basic conflict, that google doc starts with 1, and not with 0 row. For me the easiest version turned out to me my own row counter. I get puzzled every time when different providers start with 0 or 1). \$\endgroup\$Bogdan Mind– Bogdan Mind2021年10月08日 13:33:40 +00:00Commented Oct 8, 2021 at 13:33
-
\$\begingroup\$ 2) The reason for break after every successful subscription is to avoid rate limiting by google sheets (which i encountered several times). I couldn't thing of any other way to get around it, because every update in a table is done per-request basis. (i don't know a way to dl local copy of gspreadsheet, update it, and without errors replace a cloud copy of it) 3)InsecureRequestWarnings was disabled for successfull debugging with local proxy) Once again. Amazing response and ton to learn! \$\endgroup\$Bogdan Mind– Bogdan Mind2021年10月08日 13:34:12 +00:00Commented Oct 8, 2021 at 13:34
-
1\$\begingroup\$ @BogdanMind That makes sense, I was only thinking about the reading, not the writing. And, yes, in hindsight I picked a weird way to deal with the row counter. I'll try to update the review with those things in mind \$\endgroup\$Sara J– Sara J2021年10月08日 17:17:06 +00:00Commented Oct 8, 2021 at 17:17