New Research: Supply Chain Attack on Axios Pulls Malicious Dependency from npm.Details →
Socket
Book a DemoSign in
Socket

readability-js

Package Overview
Dependencies
Maintainers
1
Versions
5
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

readability-js

Turning any web page into a clean view.

latest
Source
npmnpm
Version
1.0.7
Version published
Maintainers
1
Created
Source

Readability

Nodejs module for extracting web page content using Cheerio.

Turn any web page into a clean view. This module is based on luin's readability project.

Build Status

Install

npm install readability-js

Usage

read(html [, options], callback)

Where

  • html url or html code.
  • options is an optional options object
  • callback is the callback to run - callback(error, article, meta)

Example

var read = require('readability-js');

read('http://howtonode.org/really-simple-file-uploads', function(err, article, meta) {
  // Main Article
  console.log(article.content.text());

  // Title
  console.log(article.title);

  // Article HTML Source Code
  console.log(article.content.html());
});

NB If the page has been marked with charset other than utf-8, it will be converted automatically. Charsets such as GBK, GB2312 is also supported.

Options

readability-js will pass the options to request directly. See request lib to view all available options.

readability-js has 2 additional options:

  • onlyArticleBody (Boolean) - get only article body or all main content;

  • preprocess - which should be a function to check or modify downloaded source before passing it to readability.

read(url, {
  preprocess: function(source, response, contentType, callback) {
    if (source.length > maxBodySize) {
      return callback(new Error('too big'));
    }
    callback(null, source);
  }, function(err, article, response) {
    //...
  });

Article object

  • content - The article content of the web page. Return false if failed. Is a Cheerio object.

  • title - The article title of the web page. It's may not same to the text in the <title> tag.

  • excerpt - The article description from any description, og:description or twitter:description <meta>

Keywords

readability

FAQs

Package last updated on 04 Apr 2016

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts