with open('things.txt') as things:
urls = [url.strip().lower() for url in things]
async def is_site_404(s, url):
async with s.head(f"https://example.com/{url}") as r1:
if r1.status == 404:
print('hello i am working')
async def create_tasks(urls):
tasks = []
async with aiohttp.ClientSession() as s:
for url in urls:
if len(url) >= 5 and len(url) < 16 and url.isalnum():
task = asyncio.create_task(is_site_404(s, url))
tasks.append(task)
return await asyncio.gather(*tasks)
while True:
asyncio.get_event_loop().run_until_complete(create_tasks(urls))
Hi, this is a basic asynchronous url response checker that I created and it's pretty fast, but I'm wondering if there is any way to get more requests per second with the base of this code. I have it designed so that it runs forever and just prints whenever there is a 404 in this example basically. I'm pretty new to python and coding in general and I would like some guidance/advice from anyone who has more experience with this kind of thing.. maybe there is an aiohttp alternative I should use that's faster? ANY advice is greatly appreciated.
-
\$\begingroup\$ Hi, please edit your question so the title states what your code does since reviewers would usually wanna see that first. This is mentioned in How to Ask. I would've done this for you, but since it's your question I think you can choose the most appropriate title \$\endgroup\$user228914– user2289142020年10月23日 06:41:58 +00:00Commented Oct 23, 2020 at 6:41
1 Answer 1
Overall, your code looks pretty decent. The functions are doing what they should be (create_task
shouldn't be running the tasks as well), coroutines are gathered after aggregation.
I'd suggest a few things to make it more readable (and maintainable)
if __name__
block
Put script execution content inside the if __name__ == "__main__"
block. Read more about why on stack overflow.
Variable naming
While you follow the PEP-8 convention on variable naming, the names still could use a rework, for eg. session
instead of just s
.
URL or path
URL refers to "Uniform Resource Locator", which is of the form:
scheme:[//authority]path[?query][#fragment]
You are dealing with only the path
here, scheme and authority sections have been fixed as https://example.com/
. This is again naming convenience.
Gather vs create
You are creating as well as gathering tasks in create_tasks
function.
Type hinting
New in python-3.x is the type hinting feature. I suggest using it whenever possible.
Rewrite
import asyncio
import aiohttp
HOST = "https://example.com"
THINGS_FILE = "things.txt"
def validate_path(path: str) -> bool:
return 5 <= len(path) < 16 and path.isalnum()
async def check_404(session: aiohttp.ClientSession, path: str):
async with session.head(f"{HOST}/{path}") as response:
if response.status == 404:
print("hello i am working")
async def execute_requests(paths: list[str]):
async with aiohttp.ClientSession() as session:
tasks = []
for path in paths:
if validate_path(path):
task = asyncio.create_task(check_404(session, path))
tasks.append(task)
return await asyncio.gather(*tasks)
def main():
with open(THINGS_FILE) as things:
paths = [line.strip().lower() for line in things]
while True:
asyncio.get_event_loop().run_until_complete(execute_requests(paths))
if __name__ == "__main__":
main()
This can be further rewritten depending on the data being read from file, using a map/filter to only iterate over validated path
s in file etc. The above is mostly a suggestion.
-
\$\begingroup\$ thank you for the insight, really appreciate it and will apply this stuff in the future. \$\endgroup\$humid– humid2020年10月23日 20:46:46 +00:00Commented Oct 23, 2020 at 20:46
-
\$\begingroup\$ @humid Why haven't you updated the title yet? \$\endgroup\$user228914– user2289142020年10月24日 08:35:03 +00:00Commented Oct 24, 2020 at 8:35