robot-directives
Parse robot directives within HTML meta and/or HTTP headers.
<meta name="robots" content="noindex,nofollow">
X-Robots-Tag: noindex,nofollow
- etc
Note: this library is not responsible for parsing any HTML.
Installation
Node.js >= 6
is required. To install, type this at the command line:
npm install robot-directives
Usage
const RobotDirectives = require('robot-directives');
const robots = new RobotDirectives(options)
header('googlebot: noindex')
meta('bingbot', 'unavailable_after: 1-Jan-3000 00:00:00 EST')
meta('robots', 'noarchive,nocache,nofollow');
robots.is(RobotDirectives.NOFOLLOW);
robots.is([ RobotDirectives.NOFOLLOW, RobotDirectives.FOLLOW ]);
robots.isNot([ RobotDirectives.ARCHIVE, RobotDirectives.FOLLOW ]);
robots.is(RobotDirectives.NOINDEX, {
currentTime: () => new Date('jan 1 3001').getTime(),
userAgent: 'Bingbot/2.0'
});
RobotDirectives.isBot('googlebot');
Constants
For use in comparison, the following directives are available as static properties on the constructor:
ALL
ARCHIVE
CACHE
FOLLOW
IMAGEINDEX
INDEX
NOARCHIVE
NOCACHE
NOFOLLOW
NOIMAGEINDEX
NOINDEX
NONE
NOODP
NOSNIPPET
NOTRANSLATE
ODP
SNIPPET
TRANSLATE
Methods
Parses, stores and cascades the value of an X-Robots-Tag
HTTP header.
is(directive[, options])
Validates a directive or a list of directives against parsed instructions. directive
can be a String
or an Array
. options
, if defined, will override any such defined in the constructor during instantiation. A value of true
is returned if all directives are valid.
isNot(directive[, options])
Inversion of is()
. A value of true
is returned if all directives are not valid.
meta(name, content)
Parses, stores and cascades the data within a <meta>
HTML element.
oneIs(directives[, options])
A variation of .is()
. A value of true
is returned if at least one directive is valid.
oneIsNot(directives[, options])
Inversion of oneIs()
. A value of true
is returned if at least one directive is not valid.
Functions
isBot(botname)
Returns true
if botname
is a valid bot/crawler/spider name or user-agent.
Options
allIsReadonly
Type: Boolean
Default value: true
Declaring the 'all'
directive will not affect other directives when true
. This is how most search crawlers perform.
currentTime
Type: Function
Default value: function(){ return Date.now() }
The date to use when checking if unavailable_after
has expired.
restrictive
Type: Boolean
Default value: true
Directive conflicts will be resolved by selecting the most restrictive value. Example: 'noindex,index'
will resolve to 'noindex'
because it is more restrictive. This is how Googlebot behaves, but others may differ.
userAgent
Type: String
Default value: ''
The HTTP user-agent to use when retrieving instructions via is()
and isNot()
.