Check Google Sheet for emails, verify delivery by api and Subscribe via api Benchmark

Question 1

The main objective of this bot to make fast and reliable automatic subscriptions to email newsletter for people, who filled the website form(submissions go to google sheet)

I had too many requests to google docs.(rate limit error) -> so i added sleep 5 after every subscription. Also added break into email-check loop.(to avoid subscribing too many in 1 loop)

# pylint: disable=missing-module-docstring
import quickemailverification
import json
import sentry_sdk
import re
import requests
import traceback
from bs4 import BeautifulSoup
from mod import *
import gspread
import urllib3
from os import system
import time
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
client = quickemailverification.Client(
 'eed14e5a') # key blurred
sentry_sdk.init("https://322.ingest.sentry.io/334", # key blurred
 traces_sample_rate=1.0)
def regex(regex, text):
 '''If find this text-return True'''
 return bool(re.search(regex, text))
def checkemail(email):
 '''Api for email deliverability True-means many metrics are fine'''
 quickemailverification = client.quickemailverification()
 response = quickemailverification.verify(email)
 if response.body["safe_to_send"] == "true":
 return True
 else:
 return False
# pylint: disable=missing-module-docstring
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
# For clarity in opened windows cmd(i just run .py file and they all ahve same title)
system("Ben EMail")
r = requests
def subscribe(email, name="", proxy="", debug=False):
 if debug:
 proxy = "http://127.0.0.1:8080"
 proxies = {
 "https": proxy
 }
 burp0_url = "https://api.benchmarkemail.com"
 burp0_cookies = {
 "Key": "M1oqxE5kauGh55V12W2J" # key blurred
 }
 burp0_headers = {}
 burp0_data = {
 "doubleoptin": "1",
 "fldEmail": email,
 "fldfirstname": name,
 "fldlastname": ''
 }
 r = requests.post(burp0_url,
 headers=burp0_headers,
 cookies=burp0_cookies,
 data=burp0_data,
 verify=False,
 proxies=proxies,
 timeout=10)
 if regex("Check your inbox", r.text):
 # check response html text
 print("Subscribed", name, email, r.status_code)
 success = True
 else:
 print("Error", name, email, "status code", r.status_code)
 success = False
 return success, r.text
credentials = {
 "type": "service_account",
 # key blurred
}
gc = gspread.service_account_from_dict(credentials)
sh = gc.open("email_form")
ws = sh.get_worksheet(0)
def parse_bmresponse(response):
 soup = BeautifulSoup(response, 'html.parser')
 result = soup.find("div", {"class": "content-section"})
 # apart from parsing response for "success phrase" i save api responses for further investigations
 return result.get_text()
def dotable(): # go through the whole table and find who is not subscribed yet
 all_values = ws.get_all_values()
 rownum = 1
 for row in all_values[1:]:
 rownum += 1
 if row[2] == "subscribed" or row[2] == "error":
 pass
 elif row[2] == "":
 name = row[0]
 email = row[1]
 print(fr"Empty status {name},{email} on {rownum} ")
 success, response = subscribe(email, name)
 if success:
 # Change status if subbed
 ws.update("C" + str(rownum), 'moded_ok')
 # response from email delivery api
 check_result = "Can send "+str(checkemail(email))
 # raw result from subsription func
 ws.update("F" + str(rownum), check_result)
 # response from email delivery api
 ws.update("G" + str(rownum), parse_bmresponse(response))
 break
 else:
 ws.update("C" + str(rownum), 'moded_noluck')
 ws.update("F" + str(rownum), check_result)
 ws.update("G" + str(rownum), parse_bmresponse(response))
 break
 else:
 pass
while 1 > 0:
 try:
 dotable()
 time.sleep(5)
 except requests.exceptions.ReadTimeout:
 print("Timeout server, i sleep 15")
 time.sleep(15)
 except Exception:
 traceback.print_exc()
 sentry_sdk.capture_exception()

Question 2

Exceptions

Using except Exception to ignore exceptions without knowing the exact causes is often not a great idea. It makes it hard for someone unfamiliar with the code to tell what kinds of exceptions are expected, and also means that when things go wrong in unexpected ways the program doesn't stop doing things but instead keeps doing the wrong thing, which is often harmless, but can get pretty nasty at times

It also ends up capturing KeyboardInterrupt, which is probably not very important, but still a bit annoying

Credit for using an exception handling framework and taking care to log errors, though

Imports

The import json and from mod import * lines don't seem necessary - as far as I can tell those never get used

Globals

Global variables tend to make program state a bit harder to reason about, and make the functions that use them a bit less flexible. Consider passing values like ws and client as parameters to the functions that use them instead

`regex`

Helper functions are nice, but this one in particular seems redundant. It's only used in one place, and if re.search("Check your inbox", r.text) should work fine there. Actually, since that's just a plain string and not a regex at all, if "Check your inbox" in r.text should work just as well and probably perform slightly better

Naming

Conventionally, words are split using underscores like_so. Names like checkemail, parse_bmresponse and dotable would usually be written like check_email, parse_bm_response and do_table

That said, those names aren't as descriptive as I think they could be, though:

Checking an email does describe what check_email does - but it might be clearer to say what it checks. A name like can_send_to, or is_reachable_email seems like a good fit
"Do something to a table" feels like a less clear summary of what that function does than "update the subscriptions", so a name like add_subscriptions might seem a bit more appropriate. It might not be the best name, sure, but I do prefer it over do_table
The name parse_bmresponse could be clearer if it stated how it parsed it and what data it extracts. Something like get_response_text or get_text_content seems like a good summary to me

And speaking of clarity, I appreciate the docstrings that are present, but it'd be nice if the rest of the functions also had some

`checkemail`

I did mention passing client as a parameter, but do we need to create a new Quickemailverification each time, or might it make more sense to pass that as a parameter instead? I haven't used the quickemailverification library before, and their docs website seems to be down right now, so I can't check

Also, when there's an if that only serves to determine whether to return True or return False, you usually don't need the if - the if/returns can be simplified to return response.body["safe_to_send"] == "true", since that expression is already a bool

All in all, you might be able to get something close to:

def can_send_to(email, verifier):
 '''Api for email deliverability True-means many metrics are fine'''
 response = verifier.verify(email)
 return response.body["safe_to_send"] == "true"

`dotable`

One issue here is that each call to Workspace.update sends a separate request. We can avoid that by using Workspace.batch_update instead

We might want to consider limiting ourselves to only fetching part of the worksheet by passing a range (like "A1:G", or even a limited number of rows like "A1:G500") to Workspace.get_values

When iterating over a list, if you find yourself wanting both the content and the position, the right tool for the job is the builtin enumerate, as in for index, row in enumerate(rows):. There is no need to keep track of an index manually

Why does the program check separately for the statuses subscribed and error? It doesn't do anything special in those cases
On a related note, else: pass is never necessary. That if/elif/else branch can be trimmed down to only an if row[2] == "": block

Where would the values subscribed or error come from, anyway? This program uses the values moded_ok and moded_noluck for that column, doesn't it? Are there multiple programs adding subscriptions?

If subscribing fails, we try to update column F with the value of the variable check_result - which hasn't been defined at that point, causing the entire function to fail with a NameError. That's not ideal

Since we break after attempting to subscribe once, we only ever handle one subscription each time we fetch the sheet. This feels inefficient - is there a reason not to do several at once? That could let us fetch the spreadsheet less often, and would reduce the risk of falling behind if there's a lot of new subscribers, right?

All in all, I might've done something closer to:

def add_subscriptions(worksheet, email_verifier):
 """
 Attempt to add subscriptions for the people in the worksheet that haven't previously been handled
 """
 sheet_range = "A1:G"
 values = worksheet.get_values(sheet_range)
 for index, row in enumerate(values):
 if row[2] == "":
 name = row[0]
 email = row[1]
 print(fr"Empty status {name},{email} on {index + 1} ")
 success, response = subscribe(email, name)
 if success:
 row[2] = 'moded_ok'
 row[5] = "Can send " + str(can_send_to(email, email_verifier))
 else:
 row[2] = 'moded_noluck'
 row[5] = "Subscription failed"
 
 row[6] = get_response_text(response)
 
 worksheet.batch_update([{
 "range": sheet_range,
 "values": values
 }])

Alternatively, I might try to use the Cell class with Workspace.range and Workspace.update_cells, which might look more like this:

def add_subscriptions(worksheet, email_verifier):
 """
 Attempt to add subscriptions for the people in the worksheet that haven't previously been handled
 """
 row_count = worksheet.row_count
 
 all_cells = worksheet.range(f"A1:G{row_count}")
 rows = (tuple(v) for _, v in itertools.groupby(all_cells, Cell.row.fget))
 updated_cells = []
 for row_number, row in enumerate(rows):
 if row[2].value == "":
 name = row[0].value
 email = row[1].value
 print(fr"Empty status {name},{email} on {row_number} ")
 success, response = subscribe(email, name)
 if success:
 row[2].value = 'moded_ok'
 row[5].value = "Can send " + str(can_send_to(email, email_verifier))
 else:
 row[2].value = 'moded_noluck'
 row[5].value = "Subscription failed"
 
 row[6].value = get_response_text(response)
 
 updated_cells += (row[2], row[5], row[6])
 
 if updated_cells:
 worksheet.update_cells(updated_cells)

Top-level code

To allow a module to be imported, it's usually a good idea to put top-level code in an if __name__ == "__main__" block. It's not always important, but it's a good habit to get into, and it also means the code is bundled into a single block instead of possibly being scattered all throughout the file

On a related note, InsecureRequestWarnings get disabled in two different places. Once should be enough, since they don't get re-enabled inbetween
Side note, those warnings are there for a reason - verifying certificates is usually a good idea. Is there a reason you don't want to do that?

I don't think system("Ben EMail") will set the window title - is it meant to be system("title Ben EMail")?

The r = requests line doesn't seem necessary - that value never ends up used since you spell out requests when actually using it, and the name r does get used for other variables in other places, so this just winds up confusing

while 1 > 0 works, but is a bit unusual. while True is the usual way to write an infinite loop

All in all, I'd suggest gathering the rest of the non-function non-import code into something a bit like:

if __name__ == "__main__":
 urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
 client = quickemailverification.Client(
 'eed14e5a') # key blurred
 qev = client.quickemailverification()
 #pylint: disable=E0110
 sentry_sdk.init("https://322.ingest.sentry.io/334", # key blurred
 traces_sample_rate=1.0)
 # Set console window title
 system("title Ben EMail")
 credentials = {
 "type": "service_account",
 # key blurred
 }
 gc = gspread.service_account_from_dict(credentials)
 sh = gc.open("email_form")
 ws = sh.get_worksheet(0)
 while True:
 try:
 add_subscriptions(ws, qev)
 time.sleep(5)
 except requests.exceptions.ReadTimeout:
 print("Timeout server, i sleep 15")
 time.sleep(15)
 except Exception:
 traceback.print_exc()
 sentry_sdk.capture_exception()
 raise

Question 3

Firstly Sara, thank you so much for taking time and writing such a detailed reply for me! I really appreciate that and hope to be good enough one day to help people as much as you helped me to understand ways to improve. Question. 1) Could you please explain why you do (i mean is there a more intuitive way?) row = all_values[row_number - 1] It took me a while to understand the basic conflict, that google doc starts with 1, and not with 0 row. For me the easiest version turned out to me my own row counter. I get puzzled every time when different providers start with 0 or 1).

Question 4

2) The reason for break after every successful subscription is to avoid rate limiting by google sheets (which i encountered several times). I couldn't thing of any other way to get around it, because every update in a table is done per-request basis. (i don't know a way to dl local copy of gspreadsheet, update it, and without errors replace a cloud copy of it) 3)InsecureRequestWarnings was disabled for successfull debugging with local proxy) Once again. Amazing response and ton to learn!

Question 5

@BogdanMind That makes sense, I was only thinking about the reading, not the writing. And, yes, in hindsight I picked a weird way to deal with the row counter. I'll try to update the review with those things in mind

Sara J Sara J 4,15612 silver badges37 bronze badges · Accepted Answer · 2021-10-05 19:30:29Z

Exceptions

Using except Exception to ignore exceptions without knowing the exact causes is often not a great idea. It makes it hard for someone unfamiliar with the code to tell what kinds of exceptions are expected, and also means that when things go wrong in unexpected ways the program doesn't stop doing things but instead keeps doing the wrong thing, which is often harmless, but can get pretty nasty at times

It also ends up capturing KeyboardInterrupt, which is probably not very important, but still a bit annoying

Credit for using an exception handling framework and taking care to log errors, though

Imports

The import json and from mod import * lines don't seem necessary - as far as I can tell those never get used

Globals

Global variables tend to make program state a bit harder to reason about, and make the functions that use them a bit less flexible. Consider passing values like ws and client as parameters to the functions that use them instead

`regex`

Helper functions are nice, but this one in particular seems redundant. It's only used in one place, and if re.search("Check your inbox", r.text) should work fine there. Actually, since that's just a plain string and not a regex at all, if "Check your inbox" in r.text should work just as well and probably perform slightly better

Naming

Conventionally, words are split using underscores like_so. Names like checkemail, parse_bmresponse and dotable would usually be written like check_email, parse_bm_response and do_table

That said, those names aren't as descriptive as I think they could be, though:

Checking an email does describe what check_email does - but it might be clearer to say what it checks. A name like can_send_to, or is_reachable_email seems like a good fit
"Do something to a table" feels like a less clear summary of what that function does than "update the subscriptions", so a name like add_subscriptions might seem a bit more appropriate. It might not be the best name, sure, but I do prefer it over do_table
The name parse_bmresponse could be clearer if it stated how it parsed it and what data it extracts. Something like get_response_text or get_text_content seems like a good summary to me

And speaking of clarity, I appreciate the docstrings that are present, but it'd be nice if the rest of the functions also had some

`checkemail`

I did mention passing client as a parameter, but do we need to create a new Quickemailverification each time, or might it make more sense to pass that as a parameter instead? I haven't used the quickemailverification library before, and their docs website seems to be down right now, so I can't check

Also, when there's an if that only serves to determine whether to return True or return False, you usually don't need the if - the if/returns can be simplified to return response.body["safe_to_send"] == "true", since that expression is already a bool

All in all, you might be able to get something close to:

def can_send_to(email, verifier):
 '''Api for email deliverability True-means many metrics are fine'''
 response = verifier.verify(email)
 return response.body["safe_to_send"] == "true"

`dotable`

One issue here is that each call to Workspace.update sends a separate request. We can avoid that by using Workspace.batch_update instead

We might want to consider limiting ourselves to only fetching part of the worksheet by passing a range (like "A1:G", or even a limited number of rows like "A1:G500") to Workspace.get_values

When iterating over a list, if you find yourself wanting both the content and the position, the right tool for the job is the builtin enumerate, as in for index, row in enumerate(rows):. There is no need to keep track of an index manually

Why does the program check separately for the statuses subscribed and error? It doesn't do anything special in those cases
On a related note, else: pass is never necessary. That if/elif/else branch can be trimmed down to only an if row[2] == "": block

Where would the values subscribed or error come from, anyway? This program uses the values moded_ok and moded_noluck for that column, doesn't it? Are there multiple programs adding subscriptions?

If subscribing fails, we try to update column F with the value of the variable check_result - which hasn't been defined at that point, causing the entire function to fail with a NameError. That's not ideal

Since we break after attempting to subscribe once, we only ever handle one subscription each time we fetch the sheet. This feels inefficient - is there a reason not to do several at once? That could let us fetch the spreadsheet less often, and would reduce the risk of falling behind if there's a lot of new subscribers, right?

All in all, I might've done something closer to:

def add_subscriptions(worksheet, email_verifier):
 """
 Attempt to add subscriptions for the people in the worksheet that haven't previously been handled
 """
 sheet_range = "A1:G"
 values = worksheet.get_values(sheet_range)
 for index, row in enumerate(values):
 if row[2] == "":
 name = row[0]
 email = row[1]
 print(fr"Empty status {name},{email} on {index + 1} ")
 success, response = subscribe(email, name)
 if success:
 row[2] = 'moded_ok'
 row[5] = "Can send " + str(can_send_to(email, email_verifier))
 else:
 row[2] = 'moded_noluck'
 row[5] = "Subscription failed"
 
 row[6] = get_response_text(response)
 
 worksheet.batch_update([{
 "range": sheet_range,
 "values": values
 }])

Alternatively, I might try to use the Cell class with Workspace.range and Workspace.update_cells, which might look more like this:

def add_subscriptions(worksheet, email_verifier):
 """
 Attempt to add subscriptions for the people in the worksheet that haven't previously been handled
 """
 row_count = worksheet.row_count
 
 all_cells = worksheet.range(f"A1:G{row_count}")
 rows = (tuple(v) for _, v in itertools.groupby(all_cells, Cell.row.fget))
 updated_cells = []
 for row_number, row in enumerate(rows):
 if row[2].value == "":
 name = row[0].value
 email = row[1].value
 print(fr"Empty status {name},{email} on {row_number} ")
 success, response = subscribe(email, name)
 if success:
 row[2].value = 'moded_ok'
 row[5].value = "Can send " + str(can_send_to(email, email_verifier))
 else:
 row[2].value = 'moded_noluck'
 row[5].value = "Subscription failed"
 
 row[6].value = get_response_text(response)
 
 updated_cells += (row[2], row[5], row[6])
 
 if updated_cells:
 worksheet.update_cells(updated_cells)

Top-level code

To allow a module to be imported, it's usually a good idea to put top-level code in an if __name__ == "__main__" block. It's not always important, but it's a good habit to get into, and it also means the code is bundled into a single block instead of possibly being scattered all throughout the file

On a related note, InsecureRequestWarnings get disabled in two different places. Once should be enough, since they don't get re-enabled inbetween
Side note, those warnings are there for a reason - verifying certificates is usually a good idea. Is there a reason you don't want to do that?

I don't think system("Ben EMail") will set the window title - is it meant to be system("title Ben EMail")?

The r = requests line doesn't seem necessary - that value never ends up used since you spell out requests when actually using it, and the name r does get used for other variables in other places, so this just winds up confusing

while 1 > 0 works, but is a bit unusual. while True is the usual way to write an infinite loop

All in all, I'd suggest gathering the rest of the non-function non-import code into something a bit like:

if __name__ == "__main__":
 urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
 client = quickemailverification.Client(
 'eed14e5a') # key blurred
 qev = client.quickemailverification()
 #pylint: disable=E0110
 sentry_sdk.init("https://322.ingest.sentry.io/334", # key blurred
 traces_sample_rate=1.0)
 # Set console window title
 system("title Ben EMail")
 credentials = {
 "type": "service_account",
 # key blurred
 }
 gc = gspread.service_account_from_dict(credentials)
 sh = gc.open("email_form")
 ws = sh.get_worksheet(0)
 while True:
 try:
 add_subscriptions(ws, qev)
 time.sleep(5)
 except requests.exceptions.ReadTimeout:
 print("Timeout server, i sleep 15")
 time.sleep(15)
 except Exception:
 traceback.print_exc()
 sentry_sdk.capture_exception()
 raise

Firstly Sara, thank you so much for taking time and writing such a detailed reply for me! I really appreciate that and hope to be good enough one day to help people as much as you helped me to understand ways to improve. Question. 1) Could you please explain why you do (i mean is there a more intuitive way?) row = all_values[row_number - 1] It took me a while to understand the basic conflict, that google doc starts with 1, and not with 0 row. For me the easiest version turned out to me my own row counter. I get puzzled every time when different providers start with 0 or 1).
2) The reason for break after every successful subscription is to avoid rate limiting by google sheets (which i encountered several times). I couldn't thing of any other way to get around it, because every update in a table is done per-request basis. (i don't know a way to dl local copy of gspreadsheet, update it, and without errors replace a cloud copy of it) 3)InsecureRequestWarnings was disabled for successfull debugging with local proxy) Once again. Amazing response and ton to learn!
@BogdanMind That makes sense, I was only thinking about the reading, not the writing. And, yes, in hindsight I picked a weird way to deal with the row counter. I'll try to update the review with those things in mind

Stack Exchange Network

Check Google Sheet for emails, verify delivery by api and Subscribe via api Benchmark

1 Answer 1

Exceptions

Imports

Globals

`regex`

Naming

`checkemail`

`dotable`

Top-level code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Check Google Sheet for emails, verify delivery by api and Subscribe via api Benchmark

1 Answer 1

Exceptions

Imports

Globals

regex

Naming

checkemail

dotable

Top-level code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

`regex`

`checkemail`

`dotable`