Commit 6dc202a

authored

Merge pull request avinashkranjan#2035 from GayathriManeksha/webscraper

Automated scraper

2 parents 32f6a2f + bac010a commit 6dc202aCopy full SHA for 6dc202a

File tree

3 files changed

+69

-0

lines changed

Automated_scraper.py

3 files changed

+69

-0

lines changed

`‎Automated_scraper.py/readme.md`

Lines changed: 6 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,6 @@`
	`1`	`+The users can run the script`
	`2`	`+Usage: python script.py [URL] "[CSS selector]" [Interval in minutes]`
	`3`	`+`
	`4`	`+Example : python script.py https://www.timeanddate.com/worldclock/ "body > div.main-content-div > section.bg--grey.pdflexi-t--small > div > div:nth-child(2) > div.my-city__clocks > div > div:nth-child(3) > span > span" 1`
	`5`	`+`
	`6`	`+If there is a change in content it will be displayed in the command line.`

`‎Automated_scraper.py/requirements.txt`

318 Bytes

Binary file not shown.

`‎Automated_scraper.py/script.py`

Lines changed: 63 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,63 @@`
	`1`	`+import sys`
	`2`	`+import requests`
	`3`	`+from bs4 import BeautifulSoup`
	`4`	`+import time`
	`5`	`+`
	`6`	`+def display_content(url, selector):`
	`7`	`+ try:`
	`8`	`+ # Send a GET request to the URL`
	`9`	`+ response = requests.get(url)`
	`10`	`+`
	`11`	`+ # Check if the request was successful`
	`12`	`+ if response.status_code == 200:`
	`13`	`+ # Create a BeautifulSoup object with the page content`
	`14`	`+ soup = BeautifulSoup(response.text, 'html.parser')`
	`15`	`+ # Find all elements that match the CSS selector`
	`16`	`+ elements = soup.select(selector)`
	`17`	`+ # Return the content of the matched elements`
	`18`	`+ return [element.text for element in elements]`
	`19`	`+ else:`
	`20`	`+ print("Failed to fetch the webpage.")`
	`21`	`+ except requests.exceptions.RequestException as e:`
	`22`	`+ print("Error occurred while making the request:", e)`
	`23`	`+ except Exception as e:`
	`24`	`+ print("An error occurred:", e)`
	`25`	`+`
	`26`	`+if __name__ == "__main__":`
	`27`	`+ # Check if URL, selector, and interval are provided as arguments`
	`28`	`+ if len(sys.argv) < 4:`
	`29`	`+ print("Usage: python script.py [URL] [CSS selector] [Interval in minutes]")`
	`30`	`+ sys.exit(1)`
	`31`	`+`
	`32`	`+ # Get the URL, selector, and interval from command-line arguments`
	`33`	`+ url = sys.argv[1]`
	`34`	`+ selector = sys.argv[2]`
	`35`	`+ interval_minutes = int(sys.argv[3])`
	`36`	`+`
	`37`	`+ # Store the initial contents`
	`38`	`+ initial_contents = display_content(url, selector)`
	`39`	`+ if initial_contents:`
	`40`	`+ print("Initial contents:")`
	`41`	`+ for content in initial_contents:`
	`42`	`+ print(content)`
	`43`	`+ else:`
	`44`	`+ print("No matching elements found.")`
	`45`	`+`
	`46`	`+ while True:`
	`47`	`+ # Wait for the specified interval`
	`48`	`+ time.sleep(interval_minutes * 60)`
	`49`	`+`
	`50`	`+ # Check for content changes`
	`51`	`+ current_contents = display_content(url, selector)`
	`52`	`+ if current_contents:`
	`53`	`+ # Compare with the initial contents`
	`54`	`+ if current_contents != initial_contents:`
	`55`	`+ print("Content has changed!")`
	`56`	`+ for content in current_contents:`
	`57`	`+ print(content)`
	`58`	`+ # Update the initial contents with the current contents`
	`59`	`+ initial_contents = current_contents`
	`60`	`+ else:`
	`61`	`+ print("Content has not changed.")`
	`62`	`+ else:`
	`63`	`+ print("No matching elements found.")`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 6dc202a

File tree

3 files changed

3 files changed

`‎Automated_scraper.py/readme.md`

`‎Automated_scraper.py/requirements.txt`

`‎Automated_scraper.py/script.py`

0 commit comments