Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 6dc202a

Browse files
Merge pull request avinashkranjan#2035 from GayathriManeksha/webscraper
Automated scraper
2 parents 32f6a2f + bac010a commit 6dc202a

File tree

3 files changed

+69
-0
lines changed

3 files changed

+69
-0
lines changed

‎Automated_scraper.py/readme.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
The users can run the script
2+
Usage: python script.py [URL] "[CSS selector]" [Interval in minutes]
3+
4+
Example : python script.py https://www.timeanddate.com/worldclock/ "body > div.main-content-div > section.bg--grey.pdflexi-t--small > div > div:nth-child(2) > div.my-city__clocks > div > div:nth-child(3) > span > span" 1
5+
6+
If there is a change in content it will be displayed in the command line.

‎Automated_scraper.py/requirements.txt

318 Bytes
Binary file not shown.

‎Automated_scraper.py/script.py

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
import sys
2+
import requests
3+
from bs4 import BeautifulSoup
4+
import time
5+
6+
def display_content(url, selector):
7+
try:
8+
# Send a GET request to the URL
9+
response = requests.get(url)
10+
11+
# Check if the request was successful
12+
if response.status_code == 200:
13+
# Create a BeautifulSoup object with the page content
14+
soup = BeautifulSoup(response.text, 'html.parser')
15+
# Find all elements that match the CSS selector
16+
elements = soup.select(selector)
17+
# Return the content of the matched elements
18+
return [element.text for element in elements]
19+
else:
20+
print("Failed to fetch the webpage.")
21+
except requests.exceptions.RequestException as e:
22+
print("Error occurred while making the request:", e)
23+
except Exception as e:
24+
print("An error occurred:", e)
25+
26+
if __name__ == "__main__":
27+
# Check if URL, selector, and interval are provided as arguments
28+
if len(sys.argv) < 4:
29+
print("Usage: python script.py [URL] [CSS selector] [Interval in minutes]")
30+
sys.exit(1)
31+
32+
# Get the URL, selector, and interval from command-line arguments
33+
url = sys.argv[1]
34+
selector = sys.argv[2]
35+
interval_minutes = int(sys.argv[3])
36+
37+
# Store the initial contents
38+
initial_contents = display_content(url, selector)
39+
if initial_contents:
40+
print("Initial contents:")
41+
for content in initial_contents:
42+
print(content)
43+
else:
44+
print("No matching elements found.")
45+
46+
while True:
47+
# Wait for the specified interval
48+
time.sleep(interval_minutes * 60)
49+
50+
# Check for content changes
51+
current_contents = display_content(url, selector)
52+
if current_contents:
53+
# Compare with the initial contents
54+
if current_contents != initial_contents:
55+
print("Content has changed!")
56+
for content in current_contents:
57+
print(content)
58+
# Update the initial contents with the current contents
59+
initial_contents = current_contents
60+
else:
61+
print("Content has not changed.")
62+
else:
63+
print("No matching elements found.")

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /