Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

article-parser

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

article-parser

A parser that parses articles from any url or html

  • 1.8.0
  • PyPI
  • Socket score

Maintainers
1

article-parser

GitHub Repo stars python pypi wheel license PyPI - Downloads

Extract article or news by url or html, parse the title and content.

English简体中文

How to install

article-parser is available on pypi https://pypi.org/project/article-parser/

$ pip install article-parser

Basic Usage

>>> import article_parser

article_parser.parse(
  url='',               # The URL of the article.
  html='',              # The HTML of the article.
  threshold=0.9,        # The ratio of text to the entire document, default 0.9.
  output='html',        # Result output format, support ``markdown`` and ``html``, default ``html``.
  **kwargs              # Optional arguments that `request` takes. optional
  ),
  

## ouput markdown
>>> title, content = article_parser.parse(url="http://www.chinadaily.com.cn/a/202009/22/WS5f6962b2a31024ad0ba7afcb.html", output='markdown', timeout=5)

## output html
>>> title, content = article_parser.parse(url="http://www.chinadaily.com.cn/a/202009/22/WS5f6962b2a31024ad0ba7afcb.html", timeout=5)

Example

Djokovic wins record 36th Masters title in Rome - Chinadaily.com.cn

  • Markdown
>>> import article_parser
>>> title, content = article_parser.parse(url="http://www.chinadaily.com.cn/a/202009/22/WS5f6962b2a31024ad0ba7afcb.html", output='markdown', timeout=5)
>>> print(title)
>>> print('----------------')
>>> print(content)

Djokovic wins record 36th Masters title in Rome
----------------
![](http://img2.chinadaily.com.cn/images/202009/22/5f6962b2a31024adbd959228.jpeg)
Serbia's Novak Djokovic kisses the trophy after winning the final against
Argentina's Diego Schwartzman at Italian Open, Foro Italico, Rome, Italy, Sept
21, 2020. [Photo/Agencies]

ROME - Novak Djokovic won a record 36th Masters crown as he beat Diego
Schwartzman in the men's final of the ATP Italian Open on Monday.

Djokovic, the world number one and the top seed at the tournament, won 7-5,
6-3 against Argentine Schwartzman to lift his 36th Masters title, one more
than Rafael Nadal.

The Serb said he did not play his best tennis this time in Rome, but could
find it when needed.

Simona Halep, top seed of the women's draw, won her first title in Rome after
defending champion Karolina Pliskova of the Czech Republic retired while
trailing 6-0, 2-1 in the final.
  • HTML
>>> import article_parser
>>> title, content = article_parser.parse(url="http://www.chinadaily.com.cn/a/202009/22/WS5f6962b2a31024ad0ba7afcb.html", timeout=5)
>>> print(title)
>>> print('----------------')
>>> print(content)

Djokovic wins record 36th Masters title in Rome
----------------
<div id="Content">

<figure class="image" style="display: table;">
<img data-from="newsroom" id="img-5f6962b2a31024adbd959228" src="//img2.chinadaily.com.cn/images/202009/22/5f6962b2a31024adbd959228.jpeg"/>
<figcaption style="font-size: 14px; display: table-caption; caption-side: bottom;">
   Serbia's Novak Djokovic kisses the trophy after winning the final against Argentina's Diego Schwartzman at Italian Open, Foro Italico, Rome, Italy, Sept 21, 2020. [Photo/Agencies]
 </figcaption>
</figure>
<p dir="ltr">ROME - Novak Djokovic won a record 36th Masters crown as he beat Diego Schwartzman in the men's final of the ATP Italian Open on Monday.</p>
<p dir="ltr">Djokovic, the world number one and the top seed at the tournament, won 7-5, 6-3 against Argentine Schwartzman to lift his 36th Masters title, one more than Rafael Nadal.</p>
<p dir="ltr">The Serb said he did not play his best tennis this time in Rome, but could find it when needed.</p>
<p dir="ltr">Simona Halep, top seed of the women's draw, won her first title in Rome after defending champion Karolina Pliskova of the Czech Republic retired while trailing 6-0, 2-1 in the final.</p>
</div>

Contributors

All contributions

Stargazers over time

Stargazers over time

Keywords

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc