robots-agent

robots.txt agent with cache.
Usage
import {
createRobotsAgent
} from 'robots-agent';
const robotsAgent = createRobotsAgent();
const firstUsageExample = async () => {
const robotsIsAvailable = await robotsAgent.isRobotsAvailable('http://gajus.com/');
if (!robotsIsAvailable) {
return false;
}
};
const secondUsageExample = async () => {
const robotsIsAvailable = await robotsAgent.isRobotsAvailable('https://gajus.com/');
if (!robotsIsAvailable) {
return false;
}
await robotsAgent.isAllowed('https://gajus.com/foo/');
await robotsAgent.isAllowed('https://gajus.com/bar/');
};
const main = async () => {
await firstUsageExample();
await secondUsageExample();
};
main();
API
type RobotsAgentType = {|
+getMatchingLineNumber: (url: string, userAgent?: string) => Promise<number>,
+getPreferredHost: (url: string) => Promise<string | null>,
+getSitemaps: (url: string) => Promise<$ReadOnlyArray<string>>,
+isAllowed: (url: string, userAgent?: string) => Promise<boolean>,
+isDisallowed: (url: string, userAgent?: string) => Promise<boolean>,
+isRobotsAvailable: (url: string, userAgent?: string) => Promise<boolean>
|};
Errors
All robots-agent errors extend from RobotsAgentError.
RobotsNotAvailableError is thrown when any of the robots-parser methods are invoked for a URL with unavailable robots.txt.
import {
RobotsAgentError,
RobotsNotAvailableError
} from 'robots-agent';
Difference from robots-parser
robots-agent abstract robots-parser methods by automating retrieval and cache of robots.txt content and safe handling of robots-parser methods.
Unlike robots-parser, robots-agent will throw an error if getMatchingLineNumber, getPreferredHost, getSitemaps, isAllowed or isDisallowed is invoked for a URL that does not have robots.txt. Use isRobotsAvailable to check availability of robots.txt prior to invoking the parser methods.
Implementation
robots-agent uses robots-parser to implement all robots.txt checks.