New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More
Socket
Sign inDemoInstall
Socket

metascraper

Package Overview
Dependencies
Maintainers
1
Versions
327
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

metascraper - npm Package Compare versions

Comparing version 0.2.5 to 0.2.6

5

History.md
0.2.6
-----
- add `keywords`
- add comparison to similar libraries
0.2.5

@@ -3,0 +8,0 @@ -----

53

package.json
{
"name": "metascraper",
"description": "A library to easily scrape metadata from an article on the web using Open Graph metadata, regular HTML metadata, and series of fallbacks.",
"version": "0.2.5",
"version": "0.2.6",
"repository": "git://github.com/ianstormtaylor/metascraper.git",

@@ -28,7 +28,54 @@ "main": "./lib/index.js",

"browserify": "^13.0.1",
"html-metadata": "^1.4.1",
"metaphor": "^2.1.0",
"mkdirp": "^0.5.1",
"mocha": "^2.5.2",
"mocha-phantomjs": "^4.0.2",
"node-metainspector": "^1.3.0",
"open-graph-scraper": "^2.1.0",
"popsicle": "^6.2.0",
"source-map-support": "^0.4.0"
}
"rimraf": "^2.5.2",
"source-map-support": "^0.4.0",
"summarizer": "^1.0.0",
"unfluff": "^1.0.0"
},
"keywords": [
"article",
"browser",
"cheerio",
"content",
"expand",
"extract",
"facebook",
"fallback",
"fetch",
"get",
"graph",
"html",
"meta",
"metadata",
"microformat",
"micro format",
"og",
"open",
"opengraph",
"open graph",
"page",
"parse",
"parser",
"scrape",
"scraper",
"server",
"site",
"summarize",
"summary",
"tag",
"tags",
"twitter",
"unfluff",
"unfurl",
"url",
"web",
"website"
]
}
# metascraper
# Metascraper
A library to easily scrape metadata from an article on the web using Open Graph metadata, regular HTML metadata, and series of fallbacks. Following a few principles:
- Have a high accuracy for online articles by default.
- Be usable on the server and in the browser.

@@ -14,2 +15,4 @@ - Make it simple to add new rules or override existing ones.

- [Example](#example)
- [Metadata](#metadata)
- [Comparison](#comparison)
- [Server-side Usage](#server-side-usage)

@@ -24,3 +27,3 @@ - [Browser-side Usage](#browser-side-usage)

Using Metascraper, this metadata...
Using **Metascraper**, this metadata...

@@ -42,2 +45,44 @@ {

## Metadata
Here is a list of the metadata that **Metascraper** collects by default:
- **`author`** — eg. `Noah Kulwin`<br/>
A human-readable representation of the author's name.
- **`date`** — eg. `2016-05-27T00:00:00.000Z`<br/>
An [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) representation of the date the article was published.
- **`description`** — eg. `Venture capitalists are raising money at the fastest rate...`<br/>
The publisher's chosen description of the article.
- **`image`** — eg. `https://assets.entrepreneur.com/content/3x2/1300/20160504155601-GettyImages-174457162.jpeg`<br/>
An image URL that best represents the article.
- **`publisher`** — eg. `Fast Company`<br/>
A human-readable representation of the publisher's name.
- **`title`** — eg. `Meet Wall Street's New A.I. Sheriffs`<br/>
The publisher's chosen title of the article.
- **`url`** — eg. `http://motherboard.vice.com/read/google-wins-trial-against-oracle-saves-9-billion`<br/>
The URL of the article.
## Comparison
To give you an idea of how accurate **Metascraper** is, here is a comparison of similar libraries:
| Library | [`metascraper`](https://www.npmjs.com/package/metascraper) | [`html-metadata`](https://www.npmjs.com/package/html-metadata) | [`node-metainspector`](https://www.npmjs.com/package/node-metainspector) | [`open-graph-scraper`](https://www.npmjs.com/package/open-graph-scraper) | [`unfluff`](https://www.npmjs.com/package/unfluff) |
| :--- | :--- | :--- | :--- | :--- | :--- |
| Correct | **95.54%** | **74.56%** | **61.16%** | **66.52%** | **70.90%** |
| Incorrect | 1.79% | 1.79% | 0.89% | 6.70% | 10.27% |
| Missed | 2.68% | 23.67% | 37.95% | 26.34% | 8.95% |
A big part of the reason for **Metascraper**'s better performance is that it relies on a series of fallbacks for each piece of metadata, instead of just looking for the most commonly-used, spec-compliant pieces of metadata, like Open Graph. **Metascraper**'s default settings are targetted specifically at parsing online articles, which is why it's able to be more highly-tuned than the other libraries for that purpose.
If you're interested in the breakdown by individual pieces of metadata, check out the [full summary](/support/comparison), or dive into the [raw result data for each library](/support/comparison/results).
## Server-side Usage

@@ -44,0 +89,0 @@

support/screenshot.png

Sorry, the diff of this file is not supported yet

Sorry, the diff of this file is not supported yet

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc