Checking HTTP headers with asyncio and aiohttp

Question 1

This is one of my first attempts to do something practical with asyncio. The task is simple:

Given a list of URLs, determine if the content type is HTML for every URL.

I've used aiohttp, initializing a single "session", ignoring SSL errors and issuing HEAD requests to avoid downloading the whole endpoint body. Then, I simply check if text/html is inside the Content-Type header string:

import asyncio
import aiohttp
@asyncio.coroutine
def is_html(session, url):
 response = yield from session.head(url, compress=True)
 print(url, "text/html" in response.headers["Content-Type"])
if __name__ == '__main__':
 links = ["https://httpbin.org/html",
 "https://httpbin.org/image/png",
 "https://httpbin.org/image/svg",
 "https://httpbin.org/image"]
 loop = asyncio.get_event_loop()
 conn = aiohttp.TCPConnector(verify_ssl=False)
 with aiohttp.ClientSession(connector=conn, loop=loop) as session:
 f = asyncio.wait([is_html(session, link) for link in links])
 loop.run_until_complete(f)

The code works, it prints (the output order is inconsistent, of course):

https://httpbin.org/image/svg False
https://httpbin.org/image False
https://httpbin.org/image/png False
https://httpbin.org/html True

But, I'm not sure if I'm using asyncio loop, wait and coroutines, aiohttp's connection and session objects appropriately. What would you recommend to improve?

Question 2

Please rewrite to py3.6 with async def and await...

Question 3

IMO your code should look more like this:

import asyncio
import aiohttp
URLS = [...]
if __name__ == "__main__":
 print(
 asyncio.get_event_loop().run_until_complete(
 asyncio.gather(*(foo(url) for url in URLS))))

Where individual URL is processed something like:

async def foo(url):
 async with aiohttp.ClientSession() as s:
 async with s.head(...) as r:
 return url, r.headers[...]

Note separate session for each URL.

Additionally, exception handling may be needed, in which case, it should be encapsulated inside foo.

user134596user134596 · Accepted Answer · 2017-07-20 14:30:49Z

IMO your code should look more like this:

import asyncio
import aiohttp
URLS = [...]
if __name__ == "__main__":
 print(
 asyncio.get_event_loop().run_until_complete(
 asyncio.gather(*(foo(url) for url in URLS))))

Where individual URL is processed something like:

async def foo(url):
 async with aiohttp.ClientSession() as s:
 async with s.head(...) as r:
 return url, r.headers[...]

Note separate session for each URL.

Additionally, exception handling may be needed, in which case, it should be encapsulated inside foo.

Stack Exchange Network

Checking HTTP headers with asyncio and aiohttp

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Checking HTTP headers with asyncio and aiohttp

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions