2
\$\begingroup\$

Hello beautiful people!

I have currently worked on a small script that I would of course continue to work with whenever I get a good feedback from the best code reviewer in here <3 - I have worked on a small template of monitor where I use threadpoolexeuctor to do a GET requests on multiple URL's where each URL will return a dictionary of different values (in my case its the title and repo_count.

import random
import time
from concurrent.futures import as_completed
from concurrent.futures.thread import ThreadPoolExecutor
import requests
from bs4 import BeautifulSoup
URLS = [
 'https://github.com/search?q=hello+world',
 'https://github.com/search?q=python+3',
 'https://github.com/search?q=world',
 'https://github.com/search?q=i+love+python',
 'https://github.com/search?q=sport+today',
 'https://github.com/search?q=how+to+code',
 'https://github.com/search?q=banana',
 'https://github.com/search?q=android+vs+iphone',
 'https://github.com/search?q=please+help+me',
 'https://github.com/search?q=batman',
]
def doRequest(url):
 response = requests.get(url)
 time.sleep(random.randint(10, 30))
 return response, url
def doScrape(response):
 soup = BeautifulSoup(response.text, 'html.parser')
 return {
 'title': soup.find("input", {"name": "q"})['value'],
 'repo_count': soup.find("span", {"data-search-type": "Repositories"}).text.strip()
 }
def checkDifference(old_state, new_state):
 for key in new_state:
 if key not in old_state:
 print(f"New key: {key}")
 elif old_state[key] != new_state[key]:
 print(f"Difference: {key}")
 print(f"Old: {old_state[key]}")
 print(f"New: {new_state[key]}")
 else:
 print(f"No difference: {key}")
def threadPoolLoop():
 store_data = {}
 with ThreadPoolExecutor(max_workers=5) as executor:
 future_tasks = [
 executor.submit(
 doRequest,
 url
 ) for url in URLS]
 for future in as_completed(future_tasks):
 response, url = future.result()
 if response.status_code == 200:
 store_data[url] = doScrape(response)
 return store_data
if __name__ == '__main__':
 old_state = threadPoolLoop()
 while True:
 new_state = threadPoolLoop()
 checkDifference(old_state, new_state)
 old_state = new_state

What the code basically does is that it takes each URL and compares between previous state from itself and whenever we do see a change between those two states, I would like to print out whenever there is a difference and that's pretty much it

Looking forward for improvements! :)

asked Jul 21, 2022 at 13:03
\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

Most of this code should be deleted and replaced with API calls. Sign up for a free access token to increase your rate limit. I doubt threading will help.

Beyond that, the algorithm itself is questionable. For instance, searching "world" among repositories yields 1,980,298 results. What do you realistically expect to accomplish with only the first page of these data, particularly when you accept default sortation of "best match" (whatever Github thinks that means)? This might be more practical if all of the terms you've shown are fake and you're looking for something much more targeted.

answered Jul 21, 2022 at 23:46
\$\endgroup\$
2
  • \$\begingroup\$ Hello, there is indeed a API that I wasn't aware of and could be replaced to use that. However the only difference in the code would just be the doScrape if we used JSON instead. - Regarding the data, yes. I used alot of different and common data just to be able to see catch the difference faster as we know that there is higher procentage of someone adding "world" into a repo and therefore in my script we could detect the "Difference found". If that make sense? \$\endgroup\$ Commented Jul 22, 2022 at 6:42
  • \$\begingroup\$ And of course if there is anything you want to know, please comment and I will answer asap. \$\endgroup\$ Commented Jul 22, 2022 at 18:37

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.