5
\$\begingroup\$

This is one of my first attempts to do something practical with asyncio. The task is simple:

Given a list of URLs, determine if the content type is HTML for every URL.

I've used aiohttp, initializing a single "session", ignoring SSL errors and issuing HEAD requests to avoid downloading the whole endpoint body. Then, I simply check if text/html is inside the Content-Type header string:

import asyncio
import aiohttp
@asyncio.coroutine
def is_html(session, url):
 response = yield from session.head(url, compress=True)
 print(url, "text/html" in response.headers["Content-Type"])
if __name__ == '__main__':
 links = ["https://httpbin.org/html",
 "https://httpbin.org/image/png",
 "https://httpbin.org/image/svg",
 "https://httpbin.org/image"]
 loop = asyncio.get_event_loop()
 conn = aiohttp.TCPConnector(verify_ssl=False)
 with aiohttp.ClientSession(connector=conn, loop=loop) as session:
 f = asyncio.wait([is_html(session, link) for link in links])
 loop.run_until_complete(f)

The code works, it prints (the output order is inconsistent, of course):

https://httpbin.org/image/svg False
https://httpbin.org/image False
https://httpbin.org/image/png False
https://httpbin.org/html True

But, I'm not sure if I'm using asyncio loop, wait and coroutines, aiohttp's connection and session objects appropriately. What would you recommend to improve?

asked Apr 3, 2017 at 2:48
\$\endgroup\$
1
  • \$\begingroup\$ Please rewrite to py3.6 with async def and await... \$\endgroup\$ Commented Jul 20, 2017 at 14:22

1 Answer 1

2
\$\begingroup\$

IMO your code should look more like this:

import asyncio
import aiohttp
URLS = [...]
if __name__ == "__main__":
 print(
 asyncio.get_event_loop().run_until_complete(
 asyncio.gather(*(foo(url) for url in URLS))))

Where individual URL is processed something like:

async def foo(url):
 async with aiohttp.ClientSession() as s:
 async with s.head(...) as r:
 return url, r.headers[...]

Note separate session for each URL.

Additionally, exception handling may be needed, in which case, it should be encapsulated inside foo.

answered Jul 20, 2017 at 14:30
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.