Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

html-metadata

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

html-metadata

Scrapes metadata of several different standards

1.2.0
Source
npm

Version published: 9 years ago

Weekly downloads: 673; increased by89.04%

Maintainers: 2

Weekly downloads

Created: 10 years ago

Source

html-metadata

MetaData html scraper and parser for Node.js (supports Promises and callback style)

The aim of this library is to be a comprehensive source for extracting all html embedded metadata. Currently it supports Schema.org microdata using a third party library, a native Dublin Core, Open Graph, and COinS implementation, and some general metadata that doesn't belong to a particular standard (for instance, the content of the title tag, or meta description tags).

Planned is support for RDFa , twitter, AGLS, eprints, highwire, BEPress and other yet unheard of metadata types. Contributions and requests for other metadata types welcome!

Install

npm install git://github.com/mvolz/html-metadata.git

Usage

Promise-based:

var scrape = require('html-metadata');

var url = "http://blog.woorank.com/2013/04/dublin-core-metadata-for-seo-and-usability/";

scrape(url).then(function(metadata){
	console.log(metadata);
});

Callback-based:

var scrape = require('html-metadata');

var url = "http://blog.woorank.com/2013/04/dublin-core-metadata-for-seo-and-usability/";

scrape(url, function(error, metadata){
	console.log(metadata);
});

The scrape method used here invokes the parseAll() method, which uses all the available methods registered in method metadataFunctions(), and are available for use separately as well, for example:

Promise-based:

var cheerio = require('cheerio');
var preq = require('preq'); // Promisified request library
var dublinCore = require('html-metadata').parseDublinCore;

var url = "http://blog.woorank.com/2013/04/dublin-core-metadata-for-seo-and-usability/";

preq(url).then(function(response){
	$ = cheerio.load(response.body);
	return parseDublinCore($).then(function(metadata){
		console.log(metadata);
	});
});

Callback-based:

var cheerio = require('cheerio');
var request = require('request');
var dublinCore = require('html-metadata').parseDublinCore;

var url = "http://blog.woorank.com/2013/04/dublin-core-metadata-for-seo-and-usability/";

request(url, function(error, response, html){
	$ = cheerio.load(html);
	parseDublinCore($, function(error, metadata){
		console.log(metadata);
	});
});

The method parseGeneral obtains the following general metadata:

<meta name="author" content="">
<link rel="author" href="">
<link rel="canonical" href="">
<meta name ="description" content="">
<link rel="publisher" href="">
<meta name ="robots" content="">
<link rel="shortlink" href="">
<title></title>

Tests

npm test runs the mocha tests

npm run-script coverage runs the tests and reports code coverage

Contributing

Contributions welcome! All contibutions should use bluebird promises instead of callbacks, and be .nodeify()-ed in index.js so the functions can be used as either callbacks or Promises.

Keywords

FAQs

What is html-metadata?

Is html-metadata popular?

Is html-metadata well maintained?

Package last updated on 01 Jul 2015

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

html-metadata

html-metadata

MetaData html scraper and parser for Node.js (supports Promises and callback style)

Install

Usage

Tests

Contributing

Keywords

Related posts

PyPI Introduces Digital Attestations to Strengthen Python Package Security

GitHub Removes Malicious Pull Requests Targeting Open Source Repositories