check-pages
Checks various aspects of a web page for correctness.
Install
npm install check-pages --save-dev
If you're using Grunt, the grunt-check-pages
package wraps this functionality in a Grunt task.
If you're using Gulp or another framework, the example below shows how to integrate check-pages
into your workflow.
Overview
An important aspect of creating a web site is validating the structure, content, and configuration of the site's pages. The checkPages
task provides an easy way to integrate this testing into your workflow.
By providing a list of pages to scan, the task can:
Usage
To use check-pages
with Gulp, create a task and invoke checkPages
, passing the task's callback function.
The following example includes all supported options:
var gulp = require("gulp");
var checkPages = require("check-pages");
gulp.task("checkDev", [ "start-development-server" ], function(callback) {
var options = {
pageUrls: [
'http://localhost:8080/',
'http://localhost:8080/blog',
'http://localhost:8080/about.html'
],
checkLinks: true,
onlySameDomain: true,
queryHashes: true,
noRedirects: true,
noLocalLinks: true,
linksToIgnore: [
'http://localhost:8080/broken.html'
],
checkXhtml: true,
checkCaching: true,
checkCompression: true,
maxResponseTime: 200,
userAgent: 'custom-user-agent/1.2.3',
summary: true
};
checkPages(console, options, callback);
});
gulp.task("checkProd", function(callback) {
var options = {
pageUrls: [
'http://example.com/',
'http://example.com/blog',
'http://example.com/about.html'
],
checkLinks: true,
maxResponseTime: 500
};
checkPages(console, options, callback);
});
API
module.exports = function(host, options, done) { ... }
Host
Type: Object
Required
Specifies the task environment.
For convenience, console
can be passed directly (as in the example above).
log
Type: Function
(parameters: String
)
Required
Function used to log informational messages.
error
Type: Function
(parameters: String
)
Required
Function used to log error messages.
Options
Type: Object
Required
Specifies the task configuration.
pageUrls
Type: Array
of String
Required
pageUrls
specifies a list of URLs for web pages the task will check.
URLs must be absolute and can point to local or remote content. The pageUrls
array can be empty, but must be present.
checkLinks
Type: Boolean
Default value: false
Enabling checkLinks
causes each link in a page to be checked for validity (i.e., an HTTP HEAD or GET request returns success).
For efficiency, a HEAD
request is made first and a successful result validates the link. Because some web servers misbehave, a failed HEAD
request is followed by a GET
request to definitively validate the link.
The following element/attribute pairs are used to identify links:
a
/href
area
/href
audio
/src
embed
/src
iframe
/src
img
/src
input
/src
link
/href
object
/data
script
/src
source
/src
track
/src
video
/src
onlySameDomain
Type: Boolean
Default value: false
Used by: checkLinks
Set this option to true
to block the checking of links on different domains than the referring page.
This can be useful during development when external sites aren't changing and don't need to be checked.
queryHashes
Type: Boolean
Default value: false
Used by: checkLinks
Set this option to true
to verify links with file hashes in the query string point to content that hashes to the expected value.
Query hashes can be used to invalidate cached responses when leveraging browser caching via long cache lifetimes.
Supported hash functions are:
- image.png?crc32=e4f013b5
- styles.css?md5=4f47458e34bc855a46349c1335f58cc3
- archive.zip?sha1=9511fa1a787d021bdf3aa9538029a44209fb5c4c
noRedirects
Type: Boolean
Default value: false
Used by: checkLinks
Set this option to true
to fail the task if any HTTP redirects are encountered.
This can be useful to ensure outgoing links are to the content's canonical location.
noLocalLinks
Type: Boolean
Default value: false
Used by: checkLinks
Set this option to true
to fail the task if any links to localhost
are encountered.
This is useful to detect temporary links that may work during development but would fail when deployed.
The list of host names recognized as localhost
are:
- localhost
- 127.0.0.1 (and the rest of the
127.0.0.0/8
address block) - ::1 (and its expanded forms)
linksToIgnore
Type: Array
of String
Default value: undefined
Used by: checkLinks
linksToIgnore
specifies a list of URLs that should be ignored by the link checker.
This is useful for links that are not accessible during development or known to be unreliable.
checkXhtml
Type: Boolean
Default value: false
Enabling checkXhtml
attempts to parse each URL's content as XHTML and fails if there are any structural errors.
This can be useful to ensure a page's structure is well-formed and unambiguous for browsers.
checkCaching
Type: Boolean
Default value: false
Enabling checkCaching
verifies the HTTP Cache-Control
and ETag
response headers are present and valid.
This is useful to ensure a page makes use of browser caching for better performance.
checkCompression
Type: Boolean
Default value: false
Enabling checkCompression
verifies the HTTP Content-Encoding
response header is present and valid.
This is useful to ensure a page makes use of compression for better performance.
maxResponseTime
Type: Number
Default value: undefined
maxResponseTime
specifies the maximum amount of time (in milliseconds) a page request can take to finish downloading.
Requests that take more time will trigger a failure (but are still checked for other issues).
userAgent
Type: String
Default value: check-pages/x.y.z
userAgent
specifies the value of the HTTP User-Agent
header sent with all page/link requests.
This is useful for pages that alter their behavior based on the user agent. Setting the value null
omits the User-Agent
header entirely.
summary
Type: Boolean
Default value: false
Enabling the summary
option logs a summary of each issue found after all checks have completed.
This makes it easy to pick out failures when running tests against many pages.
Release History
- 0.7.0 - Initial release, extract functionality from
grunt-check-pages
for use with Gulp. - 0.7.1 - Fix misreporting of "Bad link" for redirected links when noRedirects enabled.