Socket
Socket
Sign inDemoInstall

pillage

Package Overview
Dependencies
119
Maintainers
1
Versions
5
Alerts
File Explorer

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

    pillage

Extracts content from a web page.


Version published
Weekly downloads
0
decreased by-100%
Maintainers
1
Created
Weekly downloads
 

Readme

Source

Pillage

Pillage is a super awesome Node.js library for parsing webpages. It uses a baller algorithm to identify the content region of a webpage with accuracy that's really, really, really, really ... fun. Once we have the content region we can parse out text, images, videos and other media. We also threw in a lot of the easy stuff like OG tags for your convenience.

✝ It basically searches for every text node, then recursively climbs the parent tree, assigning a weighed "score" based on text length to each parent. The value rapidly drops off as we move up the tree. This is done for all text nodes so the weights accumulate to identify the most probable shared parent. Once we have that wrapper we can make assumptions and easily parse out body content.

Install

npm install pillage

Usage

var pillage = require('pillage');

// Fetch a URL and process
pillage(url, function(err, result) {
  console.log(result);
});

// or, process HTML directly
var result = pillage(html);
console.log(result);
 // Here's the object structure that it will return
 return {
   title: extractTitle(html),
   description: extractDescription(html),
   text: extractText(html),
   images: extractImages(html),
   videos: extractVideos(html),
   twitterTags: extractTwitterTags(html),
   openGraphTags: extractOpenGraphTags(html),
   articleTags: extractArticleTags(html),
   oEmbed: extractOEmbed(html),
 };

License

MIT

Author

Mike Holly

FAQs

Last updated on 17 Dec 2014

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc