I'm trying to understand how to work with aiohttp
and asyncio
. The code below retrieves all websites in urls
and prints out the "size" of each response.
- Is the error handling within the fetch method correct?
- Is it possible to remove the result of a specific url from
results
in case of an exception - makingreturn (url, '')
unnecessary? - Is there a better way than
ssl=False
to deal with a potentialssl.SSLCertVerificationError
? - Any additional advice on how i can improve my code quality is highly appreciated
import asyncio
import aiohttp
async def fetch(session, url):
try:
async with session.get(url, ssl=False) as response:
return url, await response.text()
except aiohttp.client_exceptions.ClientConnectorError as e:
print(e)
return (url, '')
async def main():
tasks = []
urls = [
'http://www.python.org',
'http://www.jython.org',
'http://www.pypy.org'
]
async with aiohttp.ClientSession() as session:
while urls:
tasks.append(fetch(session, urls.pop()))
results = await asyncio.gather(*tasks)
[print(f'{url}: {len(result)}') for url, result in results]
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()
Update
- Is there a way how i can add tasks to the list from within the "loop"? e.g. add new urls while scraping a website and finding new subdomains to scrape.
1 Answer 1
tasks = [] while urls: tasks.append(fetch(session, urls.pop()))
can be largely simplified to
tasks = [fetch(session, url) for url in urls]
Is it possible to remove the result of a specific url from
results
in case of an exception - makingreturn (url, '')
unnecessary?
Yes, somewhat. asyncio.gather
accept a return_exceptions
parameters. Set it to True
to avoid a single exception failing the gather
call. You must filter them out afterwards anyway:
import asyncio
import aiohttp
async def fetch(session, url):
async with session.get(url, ssl=False) as response:
return await response.text()
async def main():
urls = [
'http://www.python.org',
'http://www.jython.org',
'http://www.pypy.org'
]
async with aiohttp.ClientSession() as session:
tasks = [fetch(session, url) for url in urls]
results = await asyncio.gather(*tasks)
for url, result in zip(urls, results):
if not isinstance(result, Exception):
print(f'{url}: {len(result)}')
else:
print(f'{url} FAILED')
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()
-
\$\begingroup\$ Not sure if your code example is meant to catch exceptions or if you just wanted to point out how to deal with exceptions in the
results
list. I only manage to have a working code if i puttry... except
within thefetch
method. If putreturn e
inexcept
your code works. \$\endgroup\$RandomDude– RandomDude2018年07月24日 19:46:18 +00:00Commented Jul 24, 2018 at 19:46
Explore related questions
See similar questions with these tags.