Security News
JavaScript Leaders Demand Oracle Release the JavaScript Trademark
In an open letter, JavaScript community leaders urge Oracle to give up the JavaScript trademark, arguing that it has been effectively abandoned through nonuse.
robots.js — is parser for robots.txt files for node.js.
It's recommended to install via npm:
$ npm install -g robots
Here's an example of using robots.js:
var robots = require('robots')
, parser = new robots.RobotsParser();
parser.setUrl('http://nodeguide.ru/robots.txt', function(parser, success) {
if(success) {
parser.canFetch('*', '/doc/dailyjs-nodepad/', function (access) {
if (access) {
// parse url
}
});
}
});
Default crawler user-agent is:
Mozilla/5.0 (X11; Linux i686; rv:5.0) Gecko/20100101 Firefox/5.0
Here's an example of using another user-agent and more detailed callback:
var robots = require('robots')
, parser = new robots.RobotsParser(
'http://nodeguide.ru/robots.txt',
'Mozilla/5.0 (compatible; RobotTxtBot/1.0)',
after_parse
);
function after_parse(parser, success) {
if(success) {
parser.canFetch('*', '/doc/dailyjs-nodepad/', function (access, url, reason) {
if (access) {
console.log(' url: '+url+', access: '+access);
// parse url ...
}
});
}
};
Here's an example of getting list of sitemaps:
var robots = require('robots')
, parser = new robots.RobotsParser();
parser.setUrl('http://nodeguide.ru/robots.txt', function(parser, success) {
if(success) {
parser.getSitemaps(function(sitemaps) {
// sitemaps — array
});
}
});
Here's an example of getCrawlDelay usage:
var robots = require('robots')
, parser = new robots.RobotsParser();
// for example:
//
// $ curl -s http://nodeguide.ru/robots.txt
//
// User-agent: Google-bot
// Disallow: /
// Crawl-delay: 2
//
// User-agent: *
// Disallow: /
// Crawl-delay: 2
parser.setUrl('http://nodeguide.ru/robots.txt', function(parser, success) {
if(success) {
var GoogleBotDelay = parser.getCrawlDelay("Google-bot");
// ...
}
});
An example of passing options to the HTTP request:
var options = {
headers:{
Authorization:"Basic " + new Buffer("username:password").toString("base64")}
}
var robots = require('robots')
, parser = new robots.RobotsParser(null, options);
parser.setUrl('http://nodeguide.ru/robots.txt', function(parser, success) {
...
});
RobotsParser — main class. This class provides a set of methods to read, parse and answer questions about a single robots.txt file.
function callback(access, url, reason) { ... }
where:
access
. Object:
lib/Entry.js:
. Only for types: 'entry', 'defaultEntry'See LICENSE file.
FAQs
Parser for robots.txt
The npm package robots receives a total of 65 weekly downloads. As such, robots popularity was classified as not popular.
We found that robots demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
In an open letter, JavaScript community leaders urge Oracle to give up the JavaScript trademark, arguing that it has been effectively abandoned through nonuse.
Security News
The initial version of the Socket Python SDK is now on PyPI, enabling developers to more easily interact with the Socket REST API in Python projects.
Security News
Floating dependency ranges in npm can introduce instability and security risks into your project by allowing unverified or incompatible versions to be installed automatically, leading to unpredictable behavior and potential conflicts.