Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

metascraper

Package Overview
Dependencies
Maintainers
1
Versions
318
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

metascraper

A library to easily scrape metadata from an article on the web using Open Graph metadata, regular HTML metadata, and series of fallbacks.

  • 0.0.1
  • Source
  • npm
  • Socket score

Version published
Weekly downloads
18K
increased by20.36%
Maintainers
1
Weekly downloads
 
Created
Source

metascraper

A library to easily scrape metadata from an article on the web using Open Graph metadata, regular HTML metadata, and series of fallbacks.

Example

This metadata...

{
  "author": "Ellen Huet",
  "date": "2016-05-24T18:00:03.894Z",
  "description": "The HR startups go to war.",
  "image": "https://assets.bwbx.io/images/users/iqjWHBFdfxIU/ioh_yWEn8gHo/v1/-1x-1.jpg",
  "publisher": "Bloomberg.com",
  "title": "As Zenefits Stumbles, Gusto Goes Head-On by Selling Insurance"
}

...would be scraped from this article...

API

scrapeUrl(url, [rules])
import { scrapeUrl } from 'metascraper'

const metadata = await scrapeUrl('http://www.bloomberg.com/news/articles/2016-05-24/as-zenefits-stumbles-gusto-goes-head-on-by-selling-insurance')

Scrapes the url with matching rules.

scrapeHtml(html, [rules])
import { scrapeHtml } from 'metascraper'

const res = fetch('http://www.bloomberg.com/news/articles/2016-05-24/as-zenefits-stumbles-gusto-goes-head-on-by-selling-insurance')
const html = res.body
const metadata = await scrapeHtml(html)

Scrapes the html string with matching rules.

scrapeWindow(window, [rules])
import { scrapeWindow } from 'metascraper'

const metadata = await scrapeWindow(window)

Scrapes the window object with matching rules.

Rules

Scraping rules are just asynchonous functions that get passed the window object and return a promise that resolves with the value of the metadata. Like so:

async function rule(window) {
  const el = window.document.title
  return el.textContent
}

In the browser window will be the global you'd expect, and server-side it would be a JSDOM instance.

Passing rules is optional, and the defaults have been configured for the best results when scraping web articles. But if you want to tweak the results, or want to scrape for additional metadata, you can pass in additional rules, or completely overwrite them.

For an idea of how rules work, check out the default rules.

FAQs

Package last updated on 25 May 2016

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc