Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit acd172c

Browse files
Merge pull request avinashkranjan#674 from paulamib123/news_scrapper
News scrapper
2 parents 426c726 + f52ab4d commit acd172c

File tree

2 files changed

+127
-0
lines changed

2 files changed

+127
-0
lines changed

‎News_Scrapper/README.md‎

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# News Scrapper (India Today)
2+
3+
This Script is used to get top 10 headlines from India Today based on the category entered by user and stores it in a CSV file.
4+
5+
## Setup instructions
6+
This script requires the following:
7+
1. python3
8+
2. requests module
9+
3. beautifulsoup4 module
10+
11+
To install the modules type:
12+
13+
```bash
14+
pip3 install requests
15+
```
16+
17+
```bash
18+
pip3 install beautifulsoup4
19+
```
20+
21+
To run the script type:
22+
23+
```bash
24+
cd News_Scrapper
25+
```
26+
27+
```bash
28+
python3 scrapper.py
29+
```
30+
31+
## Output
32+
33+
![script.png](https://i.postimg.cc/CKRH9rBD/script.png)
34+
35+
36+
![excel-csv.png](https://i.postimg.cc/ZKmH0L2t/excel-csv.png)
37+
38+
39+
## By [Paulami Bhattacharya](https://github.com/paulamib123)

‎News_Scrapper/scrapper.py‎

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
from bs4 import BeautifulSoup
2+
import requests
3+
import csv
4+
5+
URL = "https://www.indiatoday.in/"
6+
7+
def writeToCSV(topTenNews, category):
8+
with open("topTen" + category + "News.csv", "w") as file:
9+
writer = csv.writer(file)
10+
writer.writerow(["Date", "Link", "Headline"])
11+
for news in topTenNews:
12+
writer.writerow([news[2], "https://www.indiatoday.in/" + news[1], news[0]])
13+
14+
def getTopTenFromDivTag(category):
15+
topTenNews = []
16+
count = 0
17+
category_url = URL + category
18+
19+
page = requests.get(category_url)
20+
soup = BeautifulSoup(page.text, "html.parser")
21+
22+
all_div_tags = soup.find_all(class_="detail")
23+
24+
for div in all_div_tags:
25+
count += 1
26+
if count > 10:
27+
break
28+
headline = div.find("h2").text
29+
link = div.find("a").attrs["href"]
30+
date = div.find("a").attrs["href"][-10:]
31+
topTenNews.append([headline, link, date])
32+
33+
return topTenNews
34+
35+
def getTopTenFromLiTag(category):
36+
topTenNews = []
37+
count = 0
38+
category_url = URL + category
39+
40+
page = requests.get(category_url)
41+
soup = BeautifulSoup(page.text, "html.parser")
42+
43+
ul_tag = soup.find_all(class_="itg-listing")
44+
ul_tag = str(ul_tag)[25:-6]
45+
li_tags =ul_tag.split("</li>")
46+
47+
for li in li_tags:
48+
count += 1
49+
if count > 10:
50+
break
51+
ele = li.split(">")
52+
link = ele[1].split("=")[1][2:-1]
53+
headline = ele[2][:-3]
54+
date = link[-10:]
55+
topTenNews.append([headline, link, date])
56+
57+
return topTenNews
58+
59+
def main():
60+
61+
categories = ["india", "world", "cities", "business", "health", "technology", "sports",
62+
"education", "lifestyle"]
63+
64+
print("Please Choose a Category from the following list")
65+
66+
for index, category in enumerate(categories):
67+
print(str(index + 1) + ". " + category.capitalize())
68+
69+
print("Example: Enter 'world' for top 10 world news")
70+
print()
71+
72+
category = input()
73+
category = category.lower()
74+
75+
if category not in categories:
76+
print("\nPlease choose a valid category!")
77+
exit()
78+
79+
if category in categories[:5]:
80+
topTenNews = getTopTenFromDivTag(category)
81+
else:
82+
topTenNews = getTopTenFromLiTag(category)
83+
84+
writeToCSV(topTenNews, category)
85+
86+
print("Created CSV File Successfully!")
87+
88+
main()

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /