Retrying a web request in Python

Question 1

Is it fine to retry a web service call like this in Python? Would there be any scenarios the exception might not work the way we expect it? I expect it to continue only if it's fine, otherwise just retry, and if that doesn't work, just throw the error out.

def getblob(html):
 num_of_retries = 2 
 time_interval = 2 
 while num_of_retries:
 try: 
 request_id = str(uuid.uuid4())
 params = [('requestId', request_id)]
 url = 'service.com?' 
 url = url + urllib.urlencode(params)
 request = urllib2.Request(url, data=html)
 request.add_header('Accept', 'application/pdf')
 request.add_header('Content-Type', 'text/html')
 handle = urllib2.urlopen(request)
 pdf_blob = handle.read() 
 break
 except Exception:
 typ, val, tb = sys.exc_info()
 logger.error(traceback.format_exception(typ, val, tb))
 time.sleep(time_interval)
 num_of_retries -= 1
 # If there aren't any retries - propogate
 if not num_of_retries: 
 raise
 return pdb_blob

I wnated to review this because recently I bumped into an issue where I hit an Internal Server Exception on the server side and the retry gave me an HTTP 200 response with incomplete file (it was a PDF without EOF Marker).

Would this snippet cause anything like this? It does work and I get PDF back as well. However from time to time I hit an HTTP 500 (some unknown server issue) and then HTTP 200 with incomplete blob. The incomplete blob is what makes me investigate if this loop can cause that problem. This HTTP 500 HTTP 200 (Incomplete Blob) is very much a pattern. Before I added this retry, I used to get HTTP 500 but never an incomplete blob.

Question 2

I could see a down vote, is this not a question for Code Review? Should this be asked elsewhere on the SO network?

Question 3

I downvoted as this question is very modified code of what you actually use. There have been two minor, but crucial edits to make it work, and currently it doesn't actually return anything either. You should take care before posting, that the code is actually working in the presented form.

Question 4

Sorry about that. Will try to make sure I post the real code. I added the return. I couldn't paste the code because it was having some properitary links and also it wasn't the cleanest code. Hence I decide to format it for the purpose of SO but unfortunately missed some crucial points.

Question 5

Instead of using while to count a specific number of times and manually handle the number, you should use for _ in range(num_of_retries). It works the same except that you don't need to manually change num_of_retries. And this way num_of_retries wont be modified from the initial value, in case you need to access it again for any reason.

There is one small change to make this work. You need to use the for ... else syntax to raise if the loop is not broken. An else block after a for block will be executed if the for loop never reaches a break statement. This fits your case perfectly as break is called on success, but if there's never success then it will run through all the iterations and reach the else block where you can raise:

for _ in range(num_of_retries):
 try: 
 request_id = str(uuid.uuid4())
 params = [('requestId', request_id)]
 url = 'service.com' 
 url = url + urllib.urlencode(params)
 request = urllib2.Request(url, data=html)
 request.add_header('Accept', 'application/pdf')
 request.add_header('Content-Type', 'text/html')
 handle = urllib2.urlopen(request)
 pdf_blob = handle.read() 
 break
 except Exception:
 typ, val, tb = sys.exc_info()
 logger.error(traceback.format_exception(typ, val, tb))
 time.sleep(time_interval)
else:
 raise

Without any explicit argument, raise will just use the last exception that has been raised and handled. See the SO answer here about it. Basically when you try except an exception, its details are being retained by Python and are passed to raise in the absence of any explicit exception passed to raise.

Question 6

Any idea on what would be the cleanest way to avoid time.sleep(time_interval) on the last retry?

Question 7

You don't need to be in the exception block to raise? Does that mean it will raise the last known exception?

Question 8

@Nishant Correct, I added a brief explanation that links to a post on SO that explains it well.

Question 9

@MathiasEttinger Hm. I guess I'd just use for i in range so I could then call if i == num_of_retries - 1. It wouldn't be entirely clear because of that - 1. Anything else I can think of involves adding more to the code than I feel should be necessary.

Question 10

You should try to reduce the amount of code protected by the try .. except construct. To me, the only line that should be protected as such is urllib2.urlopen(request), the rest being before the try or in an else clause.

You should also specialize a bit more your exception handling: trying again and again to reach an invalid url is a bad idea. But except Exception will permit that, since it catches ValueErrors raised by urllib2.urlopen when the url entered is invalid or blank. The only exceptions you should care about in your case are urllib2.URLError (and possibly urllib2.HTTPError its subclass). You will thus fail sooner if the cause of failure is not network-related.

The body of the loop would look like:

request_id = str(uuid.uuid4())
params = [('requestId', request_id)]
url = 'service.com' 
url = url + urllib.urlencode(params)
request = urllib2.Request(url, data=html)
request.add_header('Accept', 'application/pdf')
request.add_header('Content-Type', 'text/html')
try:
 handle = urllib2.urlopen(request)
except urllib2.URLError:
 typ, val, tb = sys.exc_info()
 logger.error(traceback.format_exception(typ, val, tb))
 num_of_retries -= 1
 # If there aren't any retries - propogate
 if not num_of_retries: 
 raise
 time.sleep(time_interval)
else:
 pdf_blob = handle.read()
 break

Note that I also moved the call to time.sleep(time_interval) to avoid waiting before re-raising the last exception.

Question 11

Its request not req - sorry I made a typo. Don't I need a new request object each time? Also I need a new requestId each time to track it however.

Question 12

The code would be clearer if you used a function decorator for the retry logic. One such decorator is retrying, but there are other similar libraries, and it wouldn't be hard to write one yourself.

@retry(stop_max_attempt_number=2, wait_fixed=2000)
def getblob(html):
 request_id = str(uuid.uuid4())
 ...

Question 13

That is much more cleaner.

score 5 · Answer 1 · 2015-12-04 15:43:31Z

Instead of using while to count a specific number of times and manually handle the number, you should use for _ in range(num_of_retries). It works the same except that you don't need to manually change num_of_retries. And this way num_of_retries wont be modified from the initial value, in case you need to access it again for any reason.

There is one small change to make this work. You need to use the for ... else syntax to raise if the loop is not broken. An else block after a for block will be executed if the for loop never reaches a break statement. This fits your case perfectly as break is called on success, but if there's never success then it will run through all the iterations and reach the else block where you can raise:

for _ in range(num_of_retries):
 try: 
 request_id = str(uuid.uuid4())
 params = [('requestId', request_id)]
 url = 'service.com' 
 url = url + urllib.urlencode(params)
 request = urllib2.Request(url, data=html)
 request.add_header('Accept', 'application/pdf')
 request.add_header('Content-Type', 'text/html')
 handle = urllib2.urlopen(request)
 pdf_blob = handle.read() 
 break
 except Exception:
 typ, val, tb = sys.exc_info()
 logger.error(traceback.format_exception(typ, val, tb))
 time.sleep(time_interval)
else:
 raise

Without any explicit argument, raise will just use the last exception that has been raised and handled. See the SO answer here about it. Basically when you try except an exception, its details are being retained by Python and are passed to raise in the absence of any explicit exception passed to raise.

Any idea on what would be the cleanest way to avoid time.sleep(time_interval) on the last retry?
You don't need to be in the exception block to raise? Does that mean it will raise the last known exception?
@Nishant Correct, I added a brief explanation that links to a post on SO that explains it well.
@MathiasEttinger Hm. I guess I'd just use for i in range so I could then call if i == num_of_retries - 1. It wouldn't be entirely clear because of that - 1. Anything else I can think of involves adding more to the code than I feel should be necessary.

score 4 · Answer 2 · 2015-12-04 16:12:18Z

You should try to reduce the amount of code protected by the try .. except construct. To me, the only line that should be protected as such is urllib2.urlopen(request), the rest being before the try or in an else clause.

You should also specialize a bit more your exception handling: trying again and again to reach an invalid url is a bad idea. But except Exception will permit that, since it catches ValueErrors raised by urllib2.urlopen when the url entered is invalid or blank. The only exceptions you should care about in your case are urllib2.URLError (and possibly urllib2.HTTPError its subclass). You will thus fail sooner if the cause of failure is not network-related.

The body of the loop would look like:

request_id = str(uuid.uuid4())
params = [('requestId', request_id)]
url = 'service.com' 
url = url + urllib.urlencode(params)
request = urllib2.Request(url, data=html)
request.add_header('Accept', 'application/pdf')
request.add_header('Content-Type', 'text/html')
try:
 handle = urllib2.urlopen(request)
except urllib2.URLError:
 typ, val, tb = sys.exc_info()
 logger.error(traceback.format_exception(typ, val, tb))
 num_of_retries -= 1
 # If there aren't any retries - propogate
 if not num_of_retries: 
 raise
 time.sleep(time_interval)
else:
 pdf_blob = handle.read()
 break

Note that I also moved the call to time.sleep(time_interval) to avoid waiting before re-raising the last exception.

Its request not req - sorry I made a typo. Don't I need a new request object each time? Also I need a new requestId each time to track it however.

200_success 200_success 146k22 gold badges190 silver badges479 bronze badges · Answer 3 · 2015-12-04 21:25:25Z

The code would be clearer if you used a function decorator for the retry logic. One such decorator is retrying, but there are other similar libraries, and it wouldn't be hard to write one yourself.

@retry(stop_max_attempt_number=2, wait_fixed=2000)
def getblob(html):
 request_id = str(uuid.uuid4())
 ...

\$\begingroup\$ That is much more cleaner. \$\endgroup\$

Nishant
– Nishant

2015年12月07日 14:47:07 +00:00
Commented Dec 7, 2015 at 14:47

Stack Exchange Network

Retrying a web request in Python

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Retrying a web request in Python

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions