Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

robots-parser

Package Overview
Dependencies
Maintainers
1
Versions
11
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

robots-parser

A specification compliant robots.txt parser with wildcard (*) matching support.

  • 3.0.1
  • latest
  • Source
  • npm
  • Socket score

Version published
Weekly downloads
1.1M
increased by9.22%
Maintainers
1
Weekly downloads
 
Created

What is robots-parser?

The robots-parser npm package is a tool for parsing robots.txt files, which are used to manage and control the behavior of web crawlers. This package allows you to easily interpret the rules defined in a robots.txt file and determine whether a specific user-agent is allowed to access a particular URL.

What are robots-parser's main functionalities?

Parse robots.txt

This feature allows you to parse a robots.txt file and check if a specific URL is allowed or disallowed for a given user-agent. In the example, the parser checks if 'Googlebot' is allowed to access '/private/' and '/public/' URLs.

const robotsParser = require('robots-parser');
const robotsTxt = `
User-agent: *
Disallow: /private/
`;
const parser = robotsParser('http://example.com/robots.txt', robotsTxt);
console.log(parser.isAllowed('http://example.com/private/', 'Googlebot')); // false
console.log(parser.isAllowed('http://example.com/public/', 'Googlebot')); // true

Check crawl delay

This feature allows you to retrieve the crawl delay specified for a particular user-agent. In the example, the parser retrieves the crawl delay for 'Googlebot', which is set to 10 seconds.

const robotsTxt = `
User-agent: Googlebot
Crawl-delay: 10
`;
const parser = robotsParser('http://example.com/robots.txt', robotsTxt);
console.log(parser.getCrawlDelay('Googlebot')); // 10

Get sitemap URLs

This feature allows you to extract sitemap URLs from a robots.txt file. In the example, the parser retrieves two sitemap URLs specified in the robots.txt file.

const robotsTxt = `
Sitemap: http://example.com/sitemap.xml
Sitemap: http://example.com/sitemap2.xml
`;
const parser = robotsParser('http://example.com/robots.txt', robotsTxt);
console.log(parser.getSitemaps()); // ['http://example.com/sitemap.xml', 'http://example.com/sitemap2.xml']

Other packages similar to robots-parser

Keywords

FAQs

Package last updated on 21 Feb 2023

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc