Security News
38% of CISOs Fear They’re Not Moving Fast Enough on AI
CISOs are racing to adopt AI for cybersecurity, but hurdles in budgets and governance may leave some falling behind in the fight against cyber threats.
bs-broken-links-checker
Advanced tools
Broken links checker for website pages
Broken links checker can be used as single NodeJS application and as npm - dependency which can be plugged to your package.
At first case, you should:
$ git clone https://github.com/bem-site/broken-links-checker.git
$ cd broken-links-checker
$ git checkout vx.y.z
$ npm run deps
At second case you should simply install project as yet another npm - package:
$ npm install --save bs-broken-links-checker
Usage of broken-links-checker tool from cli consists of 3 steps:
*.html
report file.You can use this command to generate tool configuration file with .js
extension.
It is suitable to have configuration file by 2 reasons:
name
- configuration file name. It is good practice to use your target website host name
as value of this parameter.Usage example:
$ node bin/blc config -n my.broken-site.com
Expected console output:
INFO acts/config.js: Configuration file: => my.broken-site.com.js has been generated successfully
Notation: generated configuration file my.broken-site.com.js
will be placed into ./configs
folder
inside process working directory.
Configuration file is simple NodeJS module, which exports object where keys are names of options and values are option values.
url
- url of website, website section or even single website page which should be analyzed for broken links.
logger
- allows to set log verbosity mode. Available values for log level are:
level
: 'verbose', 'debug', 'info', 'warn', 'error';
concurrent
- number of inner website links which would be analyzed concurrently.
The optimal value of this param should be found empirically.
If this value is too low then total time of website analyze will increase.
If this value is too high then workload your website server will increase and cause some network errors and result corruptions.
Notation: this parameter is applicable only for inner links. All external links are checked by 100 items concurrently.
requestHeaders
- allows to set custom request headers.
requestRetriesAmount
- max request attempts for one analyzed url before it will be resolved as broken.
requestTimeout
- max server response waiting time per request (in milliseconds).
acceptedSchemes
- permitted url schemas. All links which urls contains schemas different from
listed here will be excluded from analyze.
checkExternalUrls
- enables or disables external links check. If value of this param is equal to
false
, then only inner links of website will be analyzed.
excludeLinkPatterns
- allows to set url patterns which should be excluded from analyze. For example if
you want to exclude all nested links of /contacts
website section, then set as value of
excludeLinkPatterns
option:
module.exports = {
...
"excludeLinkPatterns": [
/\/contacts/
]
}
You can pass regular expression or string patterns (including wildcards) as values of this param.
More examples:
module.exports = {
...
excludeLinkPatterns: [
/\/contacts/,
http://google.com,
http://my.site.com/foo/*,
*/foo/bar
]
}
Launches website analyze process for existed broken links verification.
-c
or --config
: Path to configuration file. Required parameter.
-u
or --url
: Allows to override url of website (section, page) from configuration file.
-cc
or --concurrent
: Allows to override concurrent
parameter from configuration file.
-rra
or --requestRetriesAmount
: Allows to override requestRetriesAmount
parameter from configuration file.
-rt
or --requestTimeout
: Allows to override requestTimeout
from configuration file.
-ext
or --checkExternalUrls
: Allows to override checkExternalUrls
parameter from configuration file.
-m
or --mode
: this parameter can have one of 3 available values: 'website' (by default), 'section' and 'page'.
Notation:
Sometimes it conveniently to scan only separate section of website or even single page. Your can use mode
option for this.
If value of mode
option is equal to 'section' then only nested pages of url
option value will be scanned.
For example if website my.site.com
(which configuration file is in ./configs
folder and has name my.site.com.js
) has structure as given here:
/
/foo
/foo/foo1
/foo/foo2
/bar
then run
command with given options:
$ node bin/blc run -c ./configs/my.site.com.js -u http://my.site.com/foo -m section
will cause the analyze only of pages: /foo
, /foo1
, /foo2
. Page '/bar' will be omitted.
If value of mode
option is equal to 'page', then run
:
$ node bin/blc run -c ./configs/my.site.com.js -u http://my.site.com/foo -m page
will cause the links analyze only for /foo
page.
run
command usage:$ node bin/blc run -c ./configs/my.site.com.js
http://my.another.site.com
with configuration file from another website my.site.com
.$ node bin/blc run -c ./configs/my.site.com.js -u http://my.another.site.com
$ node bin/blc run -c ./configs/my.site.com.js -cc 50 -rt 10000 -rra 20
/foo/bar
of website http://my.site.com
.$ node bin/blc run -c ./configs/my.site.com.js -u http://my.site.com/foo/bar -m page
run
command execution:All total results of analyze will be printed into console output after run
command execution. Also generated reports file paths will be placed there.
This command will simply print current application version to console. Usage example:
$ node bin/blc version
Expected console output (version can differ from value here):
INFO cli/cmd-version.js: Application name: => bs-broken-links-checker
INFO cli/cmd-version.js: Application version: => 0.0.1
Package can be installed as usual npm dependency.
$ npm install --save bs-broken-links-checker
For tool initialization you should create new instance of BrokenLinksChecker
class.
var BrokenLinksChecker = require('bs-broken-links-checker').BrokenLinksChecker,
brokenLinksChecker = new BrokenLinksChecker();
You should call method start
and pass url of your website as argument, for example:
brokenLinksChecker.start('https://my.site.com');
BrokenLinksChecker
class constructor takes options object as argument. More detail about available option fields.
Number of inner website links which would be analyzed concurrently. The optimal value of this param should be found empirically. If this value is too low then total time of website analyze will increase. If this value is too high then workload your website server will increase and cause some network errors and result corruptions.
Value by default: 100
.
Notation: this parameter is applicable only for inner links. All external links are checked by 100 items concurrently.
Allows to set custom request headers.
Value by default: { 'user-agent': 'node-spider' }
.
Max request attempts for single analyzed url before it will be resolved as broken.
Value by default: 5.
Request timeout in milliseconds.
Value by default: 5000.
Permitted url schemas. All links which urls contains schemas different from listed here will be excluded from analyze.
Value by default: ['http:', 'https:']
.
Enables or disables external links check. If value of this param is equal to
false
, then only inner links of website will be analyzed..
Value by default: false
Allows to exclude some url patterns from processing. You can pass the array of regular expressions or
string patterns (including wildcards) as value of this option.
All url that matches on any of listed expressions will be excluded from processing.
For example if you want to exclude pages that urls contains foo
or bar
you can set this option value as: [/\/foo/i, /\/bar/i]
.
Value by default: []
More examples:
module.exports = {
...
excludeLinkPatterns: [
/\/contacts/,
http://google.com,
http://my.site.com/foo/*,
*/foo/bar
]
}
Callback function which will be fired on the end of analyze. This function takes instance of Statistic class. It has all fields and methods for working with results of scan.
You can see usage examples here.
Launch of tests with istanbul coverage calculation:
$ npm test
Code syntax check with help of: jshint, jscs
$ npm run codestyle
Special thanks to:
Developer: Kuznetsov Andrey
You can send your questions and proposals to adress or create issues here.
FAQs
Broken links checker tool fow web sites
The npm package bs-broken-links-checker receives a total of 4 weekly downloads. As such, bs-broken-links-checker popularity was classified as not popular.
We found that bs-broken-links-checker demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
CISOs are racing to adopt AI for cybersecurity, but hurdles in budgets and governance may leave some falling behind in the fight against cyber threats.
Research
Security News
Socket researchers uncovered a backdoored typosquat of BoltDB in the Go ecosystem, exploiting Go Module Proxy caching to persist undetected for years.
Security News
Company News
Socket is joining TC54 to help develop standards for software supply chain security, contributing to the evolution of SBOMs, CycloneDX, and Package URL specifications.