Hello beautiful people!
I have currently worked on a small script that I would of course continue to work with whenever I get a good feedback from the best code reviewer in here <3 - I have worked on a small template of monitor where I use threadpoolexeuctor to do a GET requests on multiple URL's where each URL will return a dictionary of different values (in my case its the title and repo_count.
import random
import time
from concurrent.futures import as_completed
from concurrent.futures.thread import ThreadPoolExecutor
import requests
from bs4 import BeautifulSoup
URLS = [
'https://github.com/search?q=hello+world',
'https://github.com/search?q=python+3',
'https://github.com/search?q=world',
'https://github.com/search?q=i+love+python',
'https://github.com/search?q=sport+today',
'https://github.com/search?q=how+to+code',
'https://github.com/search?q=banana',
'https://github.com/search?q=android+vs+iphone',
'https://github.com/search?q=please+help+me',
'https://github.com/search?q=batman',
]
def doRequest(url):
response = requests.get(url)
time.sleep(random.randint(10, 30))
return response, url
def doScrape(response):
soup = BeautifulSoup(response.text, 'html.parser')
return {
'title': soup.find("input", {"name": "q"})['value'],
'repo_count': soup.find("span", {"data-search-type": "Repositories"}).text.strip()
}
def checkDifference(old_state, new_state):
for key in new_state:
if key not in old_state:
print(f"New key: {key}")
elif old_state[key] != new_state[key]:
print(f"Difference: {key}")
print(f"Old: {old_state[key]}")
print(f"New: {new_state[key]}")
else:
print(f"No difference: {key}")
def threadPoolLoop():
store_data = {}
with ThreadPoolExecutor(max_workers=5) as executor:
future_tasks = [
executor.submit(
doRequest,
url
) for url in URLS]
for future in as_completed(future_tasks):
response, url = future.result()
if response.status_code == 200:
store_data[url] = doScrape(response)
return store_data
if __name__ == '__main__':
old_state = threadPoolLoop()
while True:
new_state = threadPoolLoop()
checkDifference(old_state, new_state)
old_state = new_state
What the code basically does is that it takes each URL and compares between previous state from itself and whenever we do see a change between those two states, I would like to print out whenever there is a difference and that's pretty much it
Looking forward for improvements! :)
1 Answer 1
Most of this code should be deleted and replaced with API calls. Sign up for a free access token to increase your rate limit. I doubt threading will help.
Beyond that, the algorithm itself is questionable. For instance, searching "world" among repositories yields 1,980,298 results. What do you realistically expect to accomplish with only the first page of these data, particularly when you accept default sortation of "best match" (whatever Github thinks that means)? This might be more practical if all of the terms you've shown are fake and you're looking for something much more targeted.
-
\$\begingroup\$ Hello, there is indeed a API that I wasn't aware of and could be replaced to use that. However the only difference in the code would just be the
doScrape
if we used JSON instead. - Regarding the data, yes. I used alot of different and common data just to be able to see catch the difference faster as we know that there is higher procentage of someone adding "world" into a repo and therefore in my script we could detect the "Difference found". If that make sense? \$\endgroup\$PythonNewbie– PythonNewbie2022年07月22日 06:42:17 +00:00Commented Jul 22, 2022 at 6:42 -
\$\begingroup\$ And of course if there is anything you want to know, please comment and I will answer asap. \$\endgroup\$PythonNewbie– PythonNewbie2022年07月22日 18:37:26 +00:00Commented Jul 22, 2022 at 18:37
Explore related questions
See similar questions with these tags.