Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 5d650e1

Browse files
news scrapper
1 parent 92b564e commit 5d650e1

File tree

3 files changed

+46
-0
lines changed

3 files changed

+46
-0
lines changed

‎Google News Scraapper/README.md‎

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
### Google News Scrapper
2+
3+
A python Automation script that helps to scrape the google news articles. The key feature of this script is that user can able to scrape the as many number of articles by giving the number of articles count. This script will generate the excel file of all the scraped articles with links with their respective titles.
4+
5+
### Setup
6+
1. Create the Virtual Environment
7+
2. Install all the required packages by using `pip3 install -r requirements.txt`
8+
9+
### Running the Script
10+
`python3 app.py`

‎Google News Scraapper/app.py‎

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
import requests
2+
from xml.dom.minidom import parseString
3+
import pandas as pd
4+
5+
def get_google_news_result(term, count):
6+
results = []
7+
obj = parseString(requests.get('http://news.google.com/news?q=%s&output=rss' % term).text)
8+
items = obj.getElementsByTagName('item')
9+
# Storing the Titles and Links
10+
titles = list()
11+
links = list()
12+
for item in items[:count]:
13+
title,link = '', ''
14+
for node in item.childNodes:
15+
if node.nodeName == 'title':
16+
title = node.childNodes[0].data
17+
elif node.nodeName == 'link':
18+
link = node.childNodes[0].data
19+
titles.append(title)
20+
links.append(link)
21+
22+
return titles, links
23+
24+
if __name__ == '__main__':
25+
titleName = input("Enter the news title keyword: ")
26+
articleCount = int(input('Enter the number of article count: '))
27+
titles, links = get_google_news_result(titleName, articleCount)
28+
29+
news = {'title' : titles,
30+
'links' : links
31+
}
32+
df = pd.DataFrame(news, columns=['title', 'links'])
33+
df.to_excel('{}_news_scrapper.xlsx'.format(titleName))
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
openpyxl==3.0.5
2+
pandas==1.0.5
3+
requests==2.24.0

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /