Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

news-fetch

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

news-fetch

news-fetch is an open-source, easy-to-use news extractor with basic NLP features (cleaning text, keywords, summary) that just works.

  • 0.2.9
  • PyPI
  • Socket score

Maintainers
1

PyPI version License Documentation Status

news-fetch

news-fetch is an open-source, easy-to-use news crawler that extracts structured information from almost any news website. It can recursively follow internal hyperlinks and read RSS feeds to fetch both recent and archived articles. You only need to provide the root URL of the news website to crawl it completely. News-fetch combines the power of multiple state-of-the-art libraries and tools, including news-please by Felix Hamborg and Newspaper3K by Lucas (欧阳象) Ou-Yang. This package leverages features from both of these works.

I built this tool to minimize NaN or empty values when scraping data from various news websites. It's platform-independent and written in Python 3, making it easy for programmers and developers to access news data for their applications.

SourceLink
PyPI:https://pypi.org/project/news-fetch/
Repository:https://santhoshse7en.github.io/news-fetch/
Documentation:https://santhoshse7en.github.io/news-fetch_doc/ (Not Yet Created!)

Dependencies

Extracted Information

news-fetch extracts the following attributes from news articles. You can also check out an example JSON file generated by news-please.

  • Headline
  • Author(s)
  • Publication date
  • Publication
  • Category
  • Source domain
  • Article content
  • Summary
  • Keywords
  • URL
  • Language

Dependency Installation

Use the package manager pip to install the required dependencies:

pip install -r requirements.txt

Usage

You can download it by clicking the green download button on Github.

To scrape all the news details, use the newspaper function:

from newsfetch.news import Newspaper

news = Newspaper(url='https://www.thehindu.com/news/cities/Madurai/aa-plays-a-pivotal-role-in-helping-people-escape-from-the-grip-of-alcoholism/article67716206.ece')
print(news.headline)
# Output: 'AA plays a pivotal role in helping people escape from the grip of alcoholism'

To extract URLs from a targeted website, call the GoogleSearchNewsURLExtractor by providing the keyword and newspaper link as arguments:

from newsfetch.google import GoogleSearchNewsURLExtractor

google = GoogleSearchNewsURLExtractor(keyword='Alcoholics Anonymous', news_domain='https://timesofindia.indiatimes.com/')
print(google.urls)
"""
['https://timesofindia.indiatimes.com/city/pune/pune-takes-a-stand-against-alcoholism-experts-collaborate-with-alcoholics-anonymous/articleshow/114438466.cms', 
'https://timesofindia.indiatimes.com/city/mumbai/we-have-lost-jobs-homes-alcoholics-anonymous/articleshow/96824383.cms', 
'https://timesofindia.indiatimes.com/city/gurgaon/gurgaons-alcoholics-open-up-about-their-road-to-recovery/articleshow/45080744.cms', 
'https://timesofindia.indiatimes.com/city/goa/alcoholism-is-illness-not-issue-of-weak-willpower-say-experts/articleshow/105320008.cms', 
'https://timesofindia.indiatimes.com/city/bhopal/alcoholism-is-an-illness-bhopal-aa-silver-jubilee-celebration/articleshow/106849014.cms', 
'https://timesofindia.indiatimes.com/city/ahmedabad/alcoholics-anonymous-switches-to-online-sessions/articleshow/76144639.cms', 
'https://timesofindia.indiatimes.com/city/kochi/keralites-trying-to-kick-alcoholism-alcoholics-anonymous/articleshow/13977818.cms', 
'https://timesofindia.indiatimes.com/city/chandigarh/alcoholics-anonymous-turned-their-lives-around/articleshow/18239.cms', 
'https://timesofindia.indiatimes.com/city/mumbai/like-air-india-flyer-alcoholics-anonymous-members-reap-whirlwind-of-job-loss-broken-homes/articleshow/96820403.cms', 
'https://timesofindia.indiatimes.com/city/nagpur/alcoholics-anonymous-meet-promotes-one-day-at-a-time/articleshow/50538092.cms']
"""

Contributing

Pull requests are welcome! For major changes, please open an issue first to discuss what you would like to change.

Make sure to update tests as appropriate.

License

This project is licensed under the MIT License.

Keywords

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc