🚀 Big News: Socket Acquires Coana to Bring Reachability Analysis to Every Appsec Team.Learn more
Socket
Book a DemoInstallSign in
Socket

robots-parser

Package Overview
Dependencies
Maintainers
1
Versions
11
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

robots-parser

A specification compliant robots.txt parser with wildcard (*) matching support.

3.0.1
latest
Source
npm
Version published
Weekly downloads
1.4M
4.8%
Maintainers
1
Weekly downloads
 
Created

What is robots-parser?

The robots-parser npm package is a tool for parsing robots.txt files, which are used to manage and control the behavior of web crawlers. This package allows you to easily interpret the rules defined in a robots.txt file and determine whether a specific user-agent is allowed to access a particular URL.

What are robots-parser's main functionalities?

Parse robots.txt

This feature allows you to parse a robots.txt file and check if a specific URL is allowed or disallowed for a given user-agent. In the example, the parser checks if 'Googlebot' is allowed to access '/private/' and '/public/' URLs.

const robotsParser = require('robots-parser');
const robotsTxt = `
User-agent: *
Disallow: /private/
`;
const parser = robotsParser('http://example.com/robots.txt', robotsTxt);
console.log(parser.isAllowed('http://example.com/private/', 'Googlebot')); // false
console.log(parser.isAllowed('http://example.com/public/', 'Googlebot')); // true

Check crawl delay

This feature allows you to retrieve the crawl delay specified for a particular user-agent. In the example, the parser retrieves the crawl delay for 'Googlebot', which is set to 10 seconds.

const robotsTxt = `
User-agent: Googlebot
Crawl-delay: 10
`;
const parser = robotsParser('http://example.com/robots.txt', robotsTxt);
console.log(parser.getCrawlDelay('Googlebot')); // 10

Get sitemap URLs

This feature allows you to extract sitemap URLs from a robots.txt file. In the example, the parser retrieves two sitemap URLs specified in the robots.txt file.

const robotsTxt = `
Sitemap: http://example.com/sitemap.xml
Sitemap: http://example.com/sitemap2.xml
`;
const parser = robotsParser('http://example.com/robots.txt', robotsTxt);
console.log(parser.getSitemaps()); // ['http://example.com/sitemap.xml', 'http://example.com/sitemap2.xml']

Other packages similar to robots-parser

Keywords

robots.txt

FAQs

Package last updated on 21 Feb 2023

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts