This is one of my first attempts to do something practical with asyncio
. The task is simple:
Given a list of URLs, determine if the content type is HTML for every URL.
I've used aiohttp
, initializing a single "session", ignoring SSL errors and issuing HEAD requests to avoid downloading the whole endpoint body. Then, I simply check if text/html
is inside the Content-Type
header string:
import asyncio
import aiohttp
@asyncio.coroutine
def is_html(session, url):
response = yield from session.head(url, compress=True)
print(url, "text/html" in response.headers["Content-Type"])
if __name__ == '__main__':
links = ["https://httpbin.org/html",
"https://httpbin.org/image/png",
"https://httpbin.org/image/svg",
"https://httpbin.org/image"]
loop = asyncio.get_event_loop()
conn = aiohttp.TCPConnector(verify_ssl=False)
with aiohttp.ClientSession(connector=conn, loop=loop) as session:
f = asyncio.wait([is_html(session, link) for link in links])
loop.run_until_complete(f)
The code works, it prints (the output order is inconsistent, of course):
https://httpbin.org/image/svg False
https://httpbin.org/image False
https://httpbin.org/image/png False
https://httpbin.org/html True
But, I'm not sure if I'm using asyncio
loop, wait and coroutines, aiohttp
's connection and session objects appropriately. What would you recommend to improve?
1 Answer 1
IMO your code should look more like this:
import asyncio
import aiohttp
URLS = [...]
if __name__ == "__main__":
print(
asyncio.get_event_loop().run_until_complete(
asyncio.gather(*(foo(url) for url in URLS))))
Where individual URL is processed something like:
async def foo(url):
async with aiohttp.ClientSession() as s:
async with s.head(...) as r:
return url, r.headers[...]
Note separate session for each URL.
Additionally, exception handling may be needed, in which case, it should be encapsulated inside foo
.
Explore related questions
See similar questions with these tags.
async def
andawait
... \$\endgroup\$