You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP →

Book a Demo Install Sign in

@hashicorp/broken-links-checker

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

@hashicorp/broken-links-checker

Broken Links Checker

1.2.10

latest

npm

Version published: 5 years ago

Maintainers: 20

Created: 7 years ago

Source

Broken Links Checker

A /smart/ standalone package that checks for broken links in HTML files that reside in a given path. This package is compatiable with Netlify servers, as it will honor any redirects mentioned in the _redirects file. Additionaly, to avoid unneccesary bandwith & time, the checker will cache URL results for the life of the process, and resolve local URLs to local files.

To run, use the CLI as follows:

> node index.js --path ~/dev/Hashicorp/middleman/build

Or, in the working directory:

> npx broken-links-checker --path ./build

CLI Parameters

Required?	Var Name	Description	Default
Yes	path	Path of HTML files
No	baseUrl	Base URL to apply to relative links	https://www.hashicorp.com/
No	verbose		false
No	configPath	Path to a linksrc JSON config file

Configuration

The checker can be fine-tuned using a .linksrc.json file. By default, the checker will attemp to read a config file from the supplied path parameter, falling back to the current working directory, and finally to the user's home folder. Here are all the parameters (and their default values) you may specify in the config file:

{
  // Feeling chatty?
  "verbose": false,

  // Base URL for which URLs are resolved to local HTML files
  // from the given --path parameter
  "baseUrl": "https://www.hashicorp.com/",

  // Max number of outstanding external HTTP requests
  "maxConcurrentUrlRequests": 50,

  // Max number of HTML files to scan at once, in parallel
  "maxConcurrentFiles": 10,

  // Tags from which to read URLs, and their respective source attribute
  "allowedTags": {
    "a": "href",
    "script": "src",
    "style": "href",
    "iframe": "src",
    "img": "src",
    "embed": "src",
    "audio": "src",
    "video": "src"
  },

  // URL protocols supported
  "allowedProtocols": ["http:", "https:"],

  // RegExp patterns to ignore completely
  "excludePatterns": [
    "^https?://.*linkedin.com",
    "^https?://.*example.com",
    "^https?://.*localhost"
  ],

  // Exclude URLs that look like IPs, but are not in a public IPv4 range
  "excludePrivateIPs": true,

  // Check same-page anchor links?
  "checkAnchorLinks": true,

  // Timeout for external HTTP requests
  "timeout": 60000,

  // Should we retry when we get a 405 error for a HEAD request?
  "retry405Head": true,

  // Should we try retry on ENOTFOUND network error?
  "retryENOTFOUND": true,

  // Overall max number of retries for any given URL
  "maxRetries": 3,

  // Toggle local in-memory cache (to avoid querying the same URL multiple times)
  "useCache": true,

  // RegExp patterns for URLs that don't support HEAD requests
  "avoidHeadForPatterns": [
    "^https?://.*linkedin.com",
    "^https?://.*youtube.com",
    "^https?://.*meetup.com"
  ]
}

Smart Local URL Resolution

Links are divided into two categories: internal links, and external ones. To determine if a link is internal, we parsed it's hostname and compare it to configured baseUrl hostname. While external links are simply checked using an HTTP HEAD request, internal ones are a bit more nuanced, since we're relying on Middleman: we construct a local path for the given link by replacing the public host, with our local path (i.e. path parameter). We then try to match, in this order, an HTML file to the resulting path: ${path}, ${path}.html, and finally ${path}/index.html.

Caching

Each URL checked is stored in a local memory cache to avoid rechecking the same URL. Although unadvised, this behavior can be disabled.

URL Resolving

External URL fetching is a bit more complex than a simple HEAD request. Some hosts apply explicit rate limits, and some apply implicit ones (e.g. LinkedIn). When we fetch an external URL, we first check out local cache. In case of a cache miss, we'll make a HEAD request to the given URL, up to maxRetries times (defaults to 3). HEAD requests require less bandwidth and are less obtrusive and GET requests. Some hosts are set to not respond to HEAD requets, and will respond with a HTTP/405 Method Not Allowed, in which case we'll retry the request with a GET request.

Netlify Redirects

Netlify redirects in the form of a _redirects file are honored and processed using the NetlifyDevServer class. Redirects files are auto detected in the root of the given path.

FAQs

What is @hashicorp/broken-links-checker?

Is @hashicorp/broken-links-checker well maintained?

Package last updated on 20 Jul 2020

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

@hashicorp/broken-links-checker

Broken Links Checker

CLI Parameters

Configuration

Smart Local URL Resolution

Caching

URL Resolving

Netlify Redirects

Related posts

60 Malicious Ruby Gems Used in Targeted Credential Theft Campaign

New CNA Scorecard Tool Ranks CVE Data Quality Across the Ecosystem