Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

santhoshse7en/news-fetch

Repository files navigation

PyPI version License Documentation Status Downloads

πŸ“° news-fetch

news-fetch is an open-source, easy-to-use news crawler that extracts structured information from almost any news website 🌐. It can recursively follow internal hyperlinks and read RSS feeds to fetch both recent and archived articles πŸ“š. You only need to provide the root URL of the news website to crawl it completely πŸ”. News-fetch combines the power of multiple state-of-the-art libraries and tools, including news-please by Felix Hamborg and Newspaper3K by Lucas (欧阳豑) Ou-Yang. This package leverages features from both of these works πŸ€–.

I built this tool to minimize NaN or empty values when scraping data from various news websites πŸš€. It's platform-independent and written in Python 3, making it easy for programmers and developers to access news data for their applications πŸ’».


πŸ”— Project Links

Source Link
PyPI: https://pypi.org/project/news-fetch/
Repository: https://santhoshse7en.github.io/news-fetch/
Documentation: https://santhoshse7en.github.io/news-fetch_doc/ (Not Yet Created!)

πŸ“¦ Dependencies

πŸ“ Extracted Information

news-fetch extracts the following attributes from news articles. You can also check out an example JSON file generated by news-please.

  • πŸ“° Headline
  • ✍️ Author(s)
  • πŸ“… Publication date
  • πŸ—žοΈ Publication
  • πŸ“‚ Category
  • 🌍 Source domain
  • πŸ“‘ Article content
  • πŸ“ Summary
  • πŸ”‘ Keywords
  • 🌐 URL
  • 🌐 Language

πŸ”§ Dependency Installation

Use the package manager pip to install the required dependencies:

pip install -r requirements.txt

πŸš€ Usage

You can download it by clicking the green download button on Github.

To scrape all the news details, use the newspaper function:

from newsfetch.news import Newspaper
news = Newspaper(url='https://www.thehindu.com/news/cities/Madurai/aa-plays-a-pivotal-role-in-helping-people-escape-from-the-grip-of-alcoholism/article67716206.ece')
print(news.headline)
# Output: 'AA plays a pivotal role in helping people escape from the grip of alcoholism'

To extract URLs from a targeted website, call the GoogleSearchNewsURLExtractor by providing the keyword and newspaper link as arguments:

from newsfetch.google import GoogleSearchNewsURLExtractor
google = GoogleSearchNewsURLExtractor(keyword='Alcoholics Anonymous', news_domain='https://timesofindia.indiatimes.com/')
print(google.urls)
"""
['https://timesofindia.indiatimes.com/city/pune/pune-takes-a-stand-against-alcoholism-experts-collaborate-with-alcoholics-anonymous/articleshow/114438466.cms', 
'https://timesofindia.indiatimes.com/city/mumbai/we-have-lost-jobs-homes-alcoholics-anonymous/articleshow/96824383.cms', 
'https://timesofindia.indiatimes.com/city/gurgaon/gurgaons-alcoholics-open-up-about-their-road-to-recovery/articleshow/45080744.cms', 
'https://timesofindia.indiatimes.com/city/goa/alcoholism-is-illness-not-issue-of-weak-willpower-say-experts/articleshow/105320008.cms', 
'https://timesofindia.indiatimes.com/city/bhopal/alcoholism-is-an-illness-bhopal-aa-silver-jubilee-celebration/articleshow/106849014.cms', 
'https://timesofindia.indiatimes.com/city/ahmedabad/alcoholics-anonymous-switches-to-online-sessions/articleshow/76144639.cms', 
'https://timesofindia.indiatimes.com/city/kochi/keralites-trying-to-kick-alcoholism-alcoholics-anonymous/articleshow/13977818.cms', 
'https://timesofindia.indiatimes.com/city/chandigarh/alcoholics-anonymous-turned-their-lives-around/articleshow/18239.cms', 
'https://timesofindia.indiatimes.com/city/mumbai/like-air-india-flyer-alcoholics-anonymous-members-reap-whirlwind-of-job-loss-broken-homes/articleshow/96820403.cms', 
'https://timesofindia.indiatimes.com/city/nagpur/alcoholics-anonymous-meet-promotes-one-day-at-a-time/articleshow/50538092.cms']
"""

🀝 Contributing

Pull requests are welcome! For major changes, please open an issue first to discuss what you would like to change.

Make sure to update tests as appropriate.

πŸ“„ License

This project is licensed under the MIT License.

Releases

No releases published

Packages

No packages published

Contributors 4

Languages

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /