![require(esm) Backported to Node.js 20, Paving the Way for ESM-Only Packages](https://cdn.sanity.io/images/cgdhsj6q/production/be8ab80c8efa5907bc341c6fefe9aa20d239d890-1600x1097.png?w=400&fit=max&auto=format)
Security News
require(esm) Backported to Node.js 20, Paving the Way for ESM-Only Packages
require(esm) backported to Node.js 20, easing the transition to ESM-only packages and reducing complexity for developers as Node 18 nears end-of-life.
robots.js — is parser for robots.txt files for node.js.
It's recommended to install via npm:
$ npm install -g robots
Here's an example of using robots.js:
var robots = require('robots')
, parser = new robots.RobotsParser();
parser.setUrl('http://nodeguide.ru/robots.txt', function(parser, success) {
if(success) {
parser.canFetch('*', '/doc/dailyjs-nodepad/', function (access) {
if (access) {
// parse url
}
});
}
});
Default crawler user-agent is:
Mozilla/5.0 (X11; Linux i686; rv:5.0) Gecko/20100101 Firefox/5.0
Here's an example of using another user-agent and more detailed callback:
var robots = require('robots')
, parser = new robots.RobotsParser(
'http://nodeguide.ru/robots.txt',
'Mozilla/5.0 (compatible; RobotTxtBot/1.0)',
after_parse
);
function after_parse(parser, success) {
if(success) {
parser.canFetch('*', '/doc/dailyjs-nodepad/', function (access, url, reason) {
if (access) {
console.log(' url: '+url+', access: '+access);
// parse url ...
}
});
}
};
Here's an example of getting list of sitemaps:
var robots = require('robots')
, parser = new robots.RobotsParser();
parser.setUrl('http://nodeguide.ru/robots.txt', function(parser, success) {
if(success) {
parser.getSitemaps(function(sitemaps) {
// sitemaps — array
});
}
});
Here's an example of getCrawlDelay usage:
var robots = require('robots')
, parser = new robots.RobotsParser();
// for example:
//
// $ curl -s http://nodeguide.ru/robots.txt
//
// User-agent: Google-bot
// Disallow: /
// Crawl-delay: 2
//
// User-agent: *
// Disallow: /
// Crawl-delay: 2
parser.setUrl('http://nodeguide.ru/robots.txt', function(parser, success) {
if(success) {
var GoogleBotDelay = parser.getCrawlDelay("Google-bot");
// ...
}
});
An example of passing options to the HTTP request:
var options = {
headers:{
Authorization:"Basic " + new Buffer("username:password").toString("base64")}
}
var robots = require('robots')
, parser = new robots.RobotsParser(null, options);
parser.setUrl('http://nodeguide.ru/robots.txt', function(parser, success) {
...
});
RobotsParser — main class. This class provides a set of methods to read, parse and answer questions about a single robots.txt file.
function callback(access, url, reason) { ... }
where:
access
. Object:
lib/Entry.js:
. Only for types: 'entry', 'defaultEntry'See LICENSE file.
FAQs
Parser for robots.txt
We found that robots demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
require(esm) backported to Node.js 20, easing the transition to ESM-only packages and reducing complexity for developers as Node 18 nears end-of-life.
Security News
PyPI now supports iOS and Android wheels, making it easier for Python developers to distribute mobile packages.
Security News
Create React App is officially deprecated due to React 19 issues and lack of maintenance—developers should switch to Vite or other modern alternatives.