Security News
Cloudflare Adds Security.txt Setup Wizard
Cloudflare has launched a setup wizard allowing users to easily create and manage a security.txt file for vulnerability disclosure on their websites.
html-metadata
Advanced tools
The aim of this library is to be a comprehensive source for extracting all html embedded metadata. Currently it supports Schema.org microdata using a third party library, a native BEPress, Dublin Core, Highwire Press, Open Graph, EPrints, and COinS implementation, and some general metadata that doesn't belong to a particular standard (for instance, the content of the title tag, or meta description tags).
Planned is support for RDFa, Twitter, AGLS, and other yet unheard of metadata types. Contributions and requests for other metadata types welcome!
npm install git://github.com/wikimedia/html-metadata.git
Promise-based:
var scrape = require('html-metadata');
var url = "http://blog.woorank.com/2013/04/dublin-core-metadata-for-seo-and-usability/";
scrape(url).then(function(metadata){
console.log(metadata);
});
Callback-based:
var scrape = require('html-metadata');
var url = "http://blog.woorank.com/2013/04/dublin-core-metadata-for-seo-and-usability/";
scrape(url, function(error, metadata){
console.log(metadata);
});
The scrape method used here invokes the parseAll() method, which uses all the available methods registered in method metadataFunctions(), and are available for use separately as well, for example:
Promise-based:
var cheerio = require('cheerio');
var preq = require('preq'); // Promisified request library
var parseDublinCore = require('html-metadata').parseDublinCore;
var url = "http://blog.woorank.com/2013/04/dublin-core-metadata-for-seo-and-usability/";
preq(url).then(function(response){
$ = cheerio.load(response.body);
return parseDublinCore($).then(function(metadata){
console.log(metadata);
});
});
Callback-based:
var cheerio = require('cheerio');
var request = require('request');
var parseDublinCore = require('html-metadata').parseDublinCore;
var url = "http://blog.woorank.com/2013/04/dublin-core-metadata-for-seo-and-usability/";
request(url, function(error, response, html){
$ = cheerio.load(html);
parseDublinCore($, function(error, metadata){
console.log(metadata);
});
});
Options object:
You can also pass an options object as the first argument containing extra parameters. Some websites require the user-agent or cookies to be set in order to get the response.
var scrape = require('html-metadata');
var request = require('request');
var options = {
url: "http://blog.woorank.com/2013/04/dublin-core-metadata-for-seo-and-usability/",
jar: request.jar(), // Cookie jar
headers: {
'User-Agent': 'webscraper'
}
scrape(options, function(error, metadata){
console.log(metadata);
});
The method parseGeneral obtains the following general metadata:
<meta name="author" content="">
<link rel="author" href="">
<link rel="canonical" href="">
<meta name ="description" content="">
<link rel="publisher" href="">
<meta name ="robots" content="">
<link rel="shortlink" href="">
<title></title>
npm test
runs the mocha tests
npm run-script coverage
runs the tests and reports code coverage
Contributions welcome! All contibutions should use bluebird promises instead of callbacks, and be .nodeify()-ed in index.js so the functions can be used as either callbacks or Promises.
FAQs
Scrapes metadata of several different standards
The npm package html-metadata receives a total of 402 weekly downloads. As such, html-metadata popularity was classified as not popular.
We found that html-metadata demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Cloudflare has launched a setup wizard allowing users to easily create and manage a security.txt file for vulnerability disclosure on their websites.
Security News
The Socket Research team breaks down a malicious npm package targeting the legitimate DOMPurify library. It uses obfuscated code to hide that it is exfiltrating browser and crypto wallet data.
Security News
ENISA’s 2024 report highlights the EU’s top cybersecurity threats, including rising DDoS attacks, ransomware, supply chain vulnerabilities, and weaponized AI.