robots-parser
Advanced tools
Comparing version 2.3.0 to 2.4.0
{ | ||
"name": "robots-parser", | ||
"version": "2.3.0", | ||
"description": "Robots.txt parser.", | ||
"version": "2.4.0", | ||
"description": "NodeJS robots.txt parser with support for wildcard (*) matching.", | ||
"keywords": [ | ||
"robots.txt", | ||
"parser", | ||
"user-agent", | ||
"scraper", | ||
"spider", | ||
"bot", | ||
"robots-exclusion-standard" | ||
], | ||
"main": "index.js", | ||
@@ -20,3 +29,4 @@ "directories": { | ||
"/Robots.js", | ||
"/index.js" | ||
"/index.js", | ||
"/index.d.ts" | ||
], | ||
@@ -30,6 +40,7 @@ "prettier": { | ||
"devDependencies": { | ||
"chai": "^4.2.0", | ||
"mocha": "^6.1.4", | ||
"nyc": "^14.1.1" | ||
} | ||
"chai": "^4.3.4", | ||
"mocha": "^9.1.3", | ||
"nyc": "^15.1.0" | ||
}, | ||
"types": "./index.d.ts" | ||
} |
144
readme.md
@@ -1,2 +0,2 @@ | ||
# Robots Parser [](https://deepscan.io/dashboard#view=project&tid=457&pid=16277&bid=344939) [](https://github.com/samclarke/robots-parser/blob/master/license.md) [](https://coveralls.io/github/samclarke/robots-parser?branch=master) | ||
# Robots Parser [](https://www.npmjs.com/package/robots-parser) [](https://deepscan.io/dashboard#view=project&tid=457&pid=16277&bid=344939) [](https://github.com/samclarke/robots-parser/blob/master/license.md) [](https://coveralls.io/github/samclarke/robots-parser?branch=master) | ||
@@ -7,9 +7,9 @@ NodeJS robots.txt parser. | ||
* User-agent: | ||
* Allow: | ||
* Disallow: | ||
* Sitemap: | ||
* Crawl-delay: | ||
* Host: | ||
* Paths with wildcards (*) and EOL matching ($) | ||
- User-agent: | ||
- Allow: | ||
- Disallow: | ||
- Sitemap: | ||
- Crawl-delay: | ||
- Host: | ||
- Paths with wildcards (\*) and EOL matching ($) | ||
@@ -42,3 +42,3 @@ ## Installation | ||
robots.isAllowed('http://www.example.com/test.html', 'Sams-Bot/1.0'); // false | ||
robots.isAllowed('http://www.example.com/test.html', 'Sams-Bot/1.0'); // true | ||
robots.isAllowed('http://www.example.com/dir/test.html', 'Sams-Bot/1.0'); // true | ||
@@ -51,4 +51,4 @@ robots.isDisallowed('http://www.example.com/dir/test2.html', 'Sams-Bot/1.0'); // true | ||
### isAllowed(url, [ua]) | ||
### isAllowed(url, [ua]) | ||
**boolean or undefined** | ||
@@ -60,4 +60,4 @@ | ||
### isDisallowed(url, [ua]) | ||
### isDisallowed(url, [ua]) | ||
**boolean or undefined** | ||
@@ -69,4 +69,4 @@ | ||
### getMatchingLineNumber(url, [ua]) | ||
### getMatchingLineNumber(url, [ua]) | ||
**number or undefined** | ||
@@ -80,4 +80,4 @@ | ||
### getCrawlDelay([ua]) | ||
### getCrawlDelay([ua]) | ||
**number or undefined** | ||
@@ -89,4 +89,4 @@ | ||
### getSitemaps() | ||
### getSitemaps() | ||
**array** | ||
@@ -96,4 +96,4 @@ | ||
### getPreferredHost() | ||
### getPreferredHost() | ||
**string or null** | ||
@@ -103,47 +103,52 @@ | ||
# Changes | ||
### Version 2.4.0: | ||
- Added Typescript definitions | ||
– Thanks to @danhab99 for creating | ||
- Added SECURITY.md policy and CodeQL scanning | ||
### Version 2.3.0: | ||
* Fixed bug where if the user-agent passed to `isAllowed()` / `isDisallowed()` is called "constructor" it would throw an error. | ||
* Added support for relative URLs. This does not affect the default behavior so can safely be upgraded. | ||
Relative matching is only allowed if both the robots.txt URL and the URLs being checked are relative. | ||
- Fixed bug where if the user-agent passed to `isAllowed()` / `isDisallowed()` is called "constructor" it would throw an error. | ||
- Added support for relative URLs. This does not affect the default behavior so can safely be upgraded. | ||
For example: | ||
```js | ||
var robots = robotsParser('/robots.txt', [ | ||
'User-agent: *', | ||
'Disallow: /dir/', | ||
'Disallow: /test.html', | ||
'Allow: /dir/test.html', | ||
'Allow: /test.html' | ||
].join('\n')); | ||
Relative matching is only allowed if both the robots.txt URL and the URLs being checked are relative. | ||
robots.isAllowed('/test.html', 'Sams-Bot/1.0'); // false | ||
robots.isAllowed('/dir/test.html', 'Sams-Bot/1.0'); // true | ||
robots.isDisallowed('/dir/test2.html', 'Sams-Bot/1.0'); // true | ||
``` | ||
For example: | ||
```js | ||
var robots = robotsParser('/robots.txt', [ | ||
'User-agent: *', | ||
'Disallow: /dir/', | ||
'Disallow: /test.html', | ||
'Allow: /dir/test.html', | ||
'Allow: /test.html' | ||
].join('\n')); | ||
robots.isAllowed('/test.html', 'Sams-Bot/1.0'); // false | ||
robots.isAllowed('/dir/test.html', 'Sams-Bot/1.0'); // true | ||
robots.isDisallowed('/dir/test2.html', 'Sams-Bot/1.0'); // true | ||
``` | ||
### Version 2.2.0: | ||
* Fixed bug that with matching wildcard patterns with some URLs | ||
– Thanks to @ckylape for reporting and fixing | ||
* Changed matching algorithm to match Google's implementation in google/robotstxt | ||
* Changed order of precedence to match current spec | ||
- Fixed bug that with matching wildcard patterns with some URLs | ||
– Thanks to @ckylape for reporting and fixing | ||
- Changed matching algorithm to match Google's implementation in google/robotstxt | ||
- Changed order of precedence to match current spec | ||
### Version 2.1.1: | ||
* Fix bug that could be used to causing rule checking to take a long time | ||
– Thanks to @andeanfog | ||
- Fix bug that could be used to causing rule checking to take a long time | ||
– Thanks to @andeanfog | ||
### Version 2.1.0: | ||
* Removed use of punycode module API's as new URL API handles it | ||
* Improved test coverage | ||
* Added tests for percent encoded paths and improved support | ||
* Added `getMatchingLineNumber()` method | ||
* Fixed bug with comments on same line as directive | ||
- Removed use of punycode module API's as new URL API handles it | ||
- Improved test coverage | ||
- Added tests for percent encoded paths and improved support | ||
- Added `getMatchingLineNumber()` method | ||
- Fixed bug with comments on same line as directive | ||
@@ -154,43 +159,42 @@ ### Version 2.0.0: | ||
* Update code to not use deprecated URL module API's. | ||
– Thanks to @kdzwinel | ||
- Update code to not use deprecated URL module API's. | ||
– Thanks to @kdzwinel | ||
### Version 1.0.2: | ||
* Fixed error caused by invalid URLs missing the protocol. | ||
- Fixed error caused by invalid URLs missing the protocol. | ||
### Version 1.0.1: | ||
* Fixed bug with the "user-agent" rule being treated as case sensitive. | ||
– Thanks to @brendonboshell | ||
* Improved test coverage. | ||
– Thanks to @schornio | ||
- Fixed bug with the "user-agent" rule being treated as case sensitive. | ||
– Thanks to @brendonboshell | ||
- Improved test coverage. | ||
– Thanks to @schornio | ||
### Version 1.0.0: | ||
* Initial release. | ||
- Initial release. | ||
# License | ||
The MIT License (MIT) | ||
The MIT License (MIT) | ||
Copyright (c) 2014 Sam Clarke | ||
Copyright (c) 2014 Sam Clarke | ||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
The above copyright notice and this permission notice shall be included in | ||
all copies or substantial portions of the Software. | ||
The above copyright notice and this permission notice shall be included in | ||
all copies or substantial portions of the Software. | ||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN | ||
THE SOFTWARE. | ||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN | ||
THE SOFTWARE. |
19357
6
417
191