Sitemap Generator
![npm](https://img.shields.io/npm/v/sitemap-generator-cli.svg)
Easily create XML sitemaps for your website.
Installation
$ npm install -S sitemap-generator
Usage
var SitemapGenerator = require('sitemap-generator');
var generator = new SitemapGenerator('http://example.com');
generator.on('done', function (sitemap) {
console.log(sitemap);
});
generator.start();
The crawler will fetch all folder URL pages and file types parsed by Google. If present the robots.txt
will be taken into account and possible rules are applied for each URL to consider if it should be added to the sitemap. Also the crawler will not fetch URL's from a page if the robots meta tag with the value nofollow
is present and ignore them completely if noindex
rule is present. The crawler is able to apply the base
value to found links.
Options
You can provide some options to alter the behaviour of the crawler.
var generator = new SitemapGenerator('http://example.com', {
restrictToBasepath: false,
stripQuerystring: true,
});
Since version 5 port is not an option anymore. If you are using the default ports for http/https your are fine. If you are using a custom port just append it to the URL.
restrictToBasepath
Type: boolean
Default: false
If you specify an URL with a path (e.g. example.com/foo/
) and this option is set to true
the crawler will only fetch URL's matching example.com/foo/*
. Otherwise it could also fetch example.com
in case a link to this URL is provided.
stripQueryString
Type: boolean
Default: true
Whether to treat URL's with query strings like http://www.example.com/?foo=bar
as indiviual sites and to add them to the sitemap.
Events
The Sitemap Generator emits several events using nodes EventEmitter
.
fetch
Triggered when the crawler tries to fetch a resource. Passes the status and the url as arguments. The status can be any HTTP status.
generator.on('fetch', function (status, url) {
});
ignore
If an URL matches a disallow rule in the robots.txt
file this event is triggered. The URL will not be added to the sitemap. Passes the ignored url as argument.
generator.on('ignore', function (url) {
});
clienterror
Thrown if there was an error on client side while fetching an URL. Passes the crawler error and additional error data as arguments.
generator.on('clienterror', function (queueError, errorData) {
});
done
Triggered when the crawler finished and the sitemap is created. Passes the created XML markup as callback argument. The second argument provides an object containing found URL's, ignored URL's and faulty URL's.
generator.on('done', function (sitemap, store) {
});