Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

static-sitemap-cli

Package Overview
Dependencies
Maintainers
1
Versions
29
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

static-sitemap-cli - npm Package Compare versions

Comparing version 1.4.2 to 1.5.0

src/sitemap.js

41

package.json
{
"name": "static-sitemap-cli",
"description": "Simple CLI to pre-generate XML sitemaps for static sites locally.",
"version": "1.4.2",
"description": "CLI to generate XML sitemaps for static sites from local filesystem",
"version": "1.5.0",
"author": "Jason Lee <jason@zerodevx.com>",
"main": "src/sitemap.js",
"bin": {

@@ -10,3 +11,7 @@ "static-sitemap-cli": "./bin/run",

},
"bugs": "https://github.com/zerodevx/static-sitemap-cli/issues",
"scripts": {
"lint": "standard | snazzy",
"pretest": "npm run lint",
"test": "nyc mocha --forbid-only \"test/**/*.test.js\""
},
"dependencies": {

@@ -18,5 +23,5 @@ "@oclif/command": "^1.8.0",

"get-stdin": "^8.0.0",
"htmlparser2": "^6.0.1",
"htmlparser2": "^6.1.0",
"js2xmlparser": "^4.0.1",
"micromatch": "^4.0.2"
"micromatch": "^4.0.4"
},

@@ -26,6 +31,6 @@ "devDependencies": {

"chai": "^4.3.4",
"eslint": "^7.23.0",
"mocha": "^8.3.2",
"nyc": "^15.1.0",
"prettier": "^2.2.1"
"snazzy": "^9.0.0",
"standard": "^16.0.3"
},

@@ -35,2 +40,5 @@ "engines": {

},
"oclif": {
"bin": "static-sitemap-cli"
},
"files": [

@@ -40,2 +48,8 @@ "/bin",

],
"license": "ISC",
"repository": {
"type": "git",
"url": "https://github.com/zerodevx/static-sitemap-cli.git"
},
"bugs": "https://github.com/zerodevx/static-sitemap-cli/issues",
"homepage": "https://npmjs.com/package/static-sitemap-cli",

@@ -47,14 +61,3 @@ "keywords": [

"CLI"
],
"license": "ISC",
"main": "src/index.js",
"oclif": {
"bin": "static-sitemap-cli"
},
"repository": "zerodevx/static-sitemap-cli",
"scripts": {
"pretest": "eslint . && npm run format",
"format": "prettier --write \"src/**/*.js\"",
"test": "nyc mocha --forbid-only \"test/**/*.test.js\""
}
]
}

@@ -6,5 +6,5 @@ ![npm](https://img.shields.io/npm/v/static-sitemap-cli)

Simple CLI to pre-generate XML sitemaps for static sites locally.
> CLI to generate XML sitemaps for static sites from local filesystem.
Built in 10 minutes. :stuck_out_tongue_winking_eye:
Quick and easy CLI to generate either XML or TXT sitemaps for your static site. Can also be used as a Node module.

@@ -14,3 +14,3 @@ ## Install

```
npm i -g static-sitemap-cli
$ npm i -g static-sitemap-cli
```

@@ -34,31 +34,58 @@

Where `sscli` is just an alias of `static-sitemap-cli`. CLI by default outputs to `stdout` -
so that you can pipe it to do other cool stuff. CLI also allows you to pipe in BASEURL via `stdin`.
where `sscli` is an alias of `static-sitemap-cli`. CLI by default outputs to `stdout` - so that you can pipe it to do other cool stuff. CLI also allows you to pipe in BASEURL via `stdin`.
### Arguments
## Options
| Argument | Description |
| -------- | -------------------------------------------------------------------------------------- |
| BASEURL | Base URL that is prefixed to all location entries. For example: `https://example.com/` |
```
USAGE
$ static-sitemap-cli [BASEURL]
### Options
ARGUMENTS
BASEURL Base URL that is prefixed to all sitemap items.
For example: https://example.com/
| Option | Long | Description |
| ------- | ---------------- | ----------------------------------------------------------------------------------------------------------------- |
| -h | --help | show CLI help |
| -V | --version | show CLI version |
| -r | --root | [default: current dir] root directory to start from |
| -m | --match | [default: **/*.html,!404.html] list of globs to match |
| -p | --priority | glob-priority pair (eg: foo/\*.html=0.1) |
| -c | --changefreq | glob-changefreq pair (eg: foo/\*.html=daily) |
| -n | --no-clean | disable clean URLs |
| -l | --slash | add trailing slash to all URLs |
| -t | --text | output as .TXT instead |
| -s | --save | save output to XML and TXT files directly |
| -o | --output-dir | specify the output dir; used together with --save; defaults to root working directory |
| -v | --verbose | be more verbose |
| **N/A** | --follow-noindex | removes pages with noindex meta tag from sitemap **(up to 5x slower due to reading and parsing every HTML file)** |
OPTIONS
-V, --version show CLI version
#### Clean URLs
-c, --changefreq=changefreq `=`-separated glob-changefreq pair [eg:
bar/**=daily]
-h, --help show CLI help
-l, --slash add trailing slash to all URLs
-m, --match=match [default: **/*.html,!404.html] micromatch globs
to match
-n, --no-clean disable clean URLs
-o, --output-dir=output-dir specify the output dir; used together with
--save; defaults to root working directory
-p, --priority=priority `=`-separated glob-priority pair [eg: foo/**=0.1]
-r, --root=root [default: .] root working directory
-s, --save write both XML and TXT outputs to file directly
instead of `stdout`
-t, --text output as text instead of XML
-v, --verbose be more verbose
--follow-noindex removes pages with noindex meta tag from sitemap
(up to 5x slower due to reading and parsing every
HTML file)
DESCRIPTION
CLI to generate XML sitemaps for static sites from local filesystem.
At its most basic, just run from root of distribution:
$ sscli https://example.com > sitemap.xml
CLI by default outputs to 'stdout'; BASEURL can be piped in via 'stdin'.
```
### Clean URLs
Whether or not to include the `.html` extension. By default, something like:

@@ -72,3 +99,3 @@

#### Trailing Slashes
### Trailing Slashes

@@ -81,3 +108,3 @@ Controls whether or not URLs should include trailing slashes. For example:

#### Ignore Some Files
### Ignore Some Files

@@ -89,3 +116,3 @@ The `-m` flag allows multiple entries to be input. By default it's set to the following globs: `**/*.html` and `!404.html`.

#### Glob-\* Pairs
### Glob-\* Pairs

@@ -103,3 +130,3 @@ The `-p` and `-c` flags allow multiple entries and accept `glob-*` pairs as input. A `glob-*` pair is input as

#### Output as Text
### Output as Text

@@ -112,3 +139,3 @@ Sitemaps can be formatted as a simple [text file](https://support.google.com/webmasters/answer/183668?hl=en) as well,

#### Create sitemap for `dist` folder
### Create sitemap for `dist` folder

@@ -121,7 +148,7 @@ `static-sitemap-cli https://example.com -r dist > dist/sitemap.xml`

#### Ignore a bunch of files
### Ignore a bunch of files
`sscli https://example.com -m '**/*.html' '!404.html' '!**/ignore/**' '!this/other/specific/file.html' > sm.xml`
#### Set priority of certain pages
### Set priority of certain pages

@@ -133,18 +160,48 @@ By default, the optional `<priority>` label ([protocol reference](https://www.sitemaps.org/protocol.html)) is excluded,

#### Set changefreq of all pages to weekly, and some to daily
### Set changefreq of all pages to weekly, and some to daily
`sscli https://example.com -c '**/*=weekly' -c 'events/**=daily' > sm.xml`
#### Pipe in the base URL
### Pipe in the base URL
`echo https://example.com | sscli > sm.xml`
#### Save XML and TXT files into a specified location directly
### Save XML and TXT files into a specified location directly
`sscli https://example.com -r 'src' -s -o 'dist'`
## To-do
## Programmatic Use
~~Add tests! :sweat_smile:~~
`static-sitemap-cli` can also be used as a Node module.
```js
const generateSitemap = require('static-sitemap-cli')
const flags = {
root: './dist', // required
match: ['**/*.html', '!404.html'], // required
slash: false,
'no-clean': false,
text: false,
priority: null,
changefreq: null,
save: false,
'follow-noindex': false,
verbose: false
}
// Pass in `baseUrl` and `flags`
const sitemap = generateSitemap('https://x.com', flags) // returns XML string
const txt = generateSitemap('https://x.com', { // returns TXT string
...flags,
text: true
})
const maps = generateSitemap('https://x.com', { // returns BOTH XML and TXT as an object
...flags, // eg: { xml: '...', txt: '...' }
save: true
})
```
## Tests

@@ -156,3 +213,3 @@

#### To slash or not to slash
### To slash or not to slash

@@ -168,9 +225,16 @@ First of all, search engines treat trailing slashes the same **only** for **root URLs**.

(1) and (2) are **root URLs** and are treated exactly the same; while (3) and (4) are different and are treated as 2 unique addresses. This can be verified through devtools - where you'll notice there aren't `301 redirects` when (1) or (2) are entered into the URL address bar.
(1) and (2) are **root URLs** and are treated exactly the same; while (3) and (4) are different and are treated as
2 unique addresses. This can be verified through devtools - where you'll notice there aren't `301 redirects` when
(1) or (2) are entered into the URL address bar.
Internally, browsers _append_ the slash when a root URL is entered, but _hides_ the slash when displayed in the URL address bar - for vanity purposes.
Internally, browsers _append_ the slash when a root URL is entered, but _hides_ the slash when displayed in the URL
address bar - for vanity purposes.
To synchronise with browser behaviour, this [commit](https://github.com/zerodevx/static-sitemap-cli/commit/04e6b79abfe26ed55c7dec8287bccfac7400a01f) adds the trailing slash for **all** root URLs, even if the `--slash` flag is unused.
To synchronise with browser behaviour, this
[commit](https://github.com/zerodevx/static-sitemap-cli/commit/04e6b79abfe26ed55c7dec8287bccfac7400a01f) adds the
trailing slash for **all** root URLs, even if the `--slash` flag is unused.
Is this important? Not really - most of the time; but if you're using [Google AMP](https://amp.dev/), then yes, the trailing slash on all root URLs is important. Why? Because of how [AMP Cache](https://developers.google.com/amp/cache/) stores the root URL _always with_ the trailing slash - so you can use your sitemap to perform cache-busting operations.
Is this important? Not really - most of the time; but if you're using [Google AMP](https://amp.dev/), then yes, the
trailing slash on all root URLs is important. Why? Because of how [AMP Cache](https://developers.google.com/amp/cache/)
stores the root URL _always with_ the trailing slash - so you can use your sitemap to perform cache-busting operations.

@@ -183,69 +247,2 @@ ## License

**v1.4.2** - 2021-03-31:
- Update dependencies.
**v1.4.1** - 2020-09-30:
- Update dependencies.
**v1.4.0** - 2020-07-09:
- Add `noindex` meta tag detection feature per [#9](https://github.com/zerodevx/static-sitemap-cli/issues/9). (Thanks [@davwheat](https://github.com/davwheat)!)
**v1.3.3** - 2020-07-07:
- Update dependencies.
**v1.3.2** - 2020-02-22:
- Update the changelog.
- Update dependencies.
**v1.3.1** - 2020-02-22:
- Fixes [#7](https://github.com/zerodevx/static-sitemap-cli/pull/7) typo in README re `--changefreq` alias. (Thanks [@joshtaylor](https://github.com/joshtaylor)!)
**v1.3.0** - 2020-01-10:
- `--save` now outputs BOTH sitemap.xml and sitemap.txt formats.
- Update dependencies.
**v1.2.0** - 2019-09-26:
- Always add trailing slash to root urls. (ref: [implementation notes](#to-slash-or-not-to-slash))
**v1.1.0** - 2019-08-18:
- **BREAKING**: Trailing slash alias `-s` renamed to `-l`. Sorry. :cry:
- Add feature save directly to file `<rootDir>/sitemap.xml` instead of `stdout`.
**v1.0.1** - 2019-08-16:
- Bugfix - empty line at EOF in text mode.
**v1.0.0** - 2019-08-15:
- **BREAKING:** `--ignore` is deprecated. Use `--match` instead.
- **BREAKING:** Glob-\* pairs are no longer comma-seperated. Use `=` instead.
- **BREAKING:** Logic for multiple glob-\* pairs changed. Later pairs override the earlier ones now.
- Major refactor of original codebase; discontinued usage of [globby](https://www.npmjs.com/package/globby) and [sitemap](https://www.npmjs.com/package/sitemap) in favour of [fast-glob](https://www.npmjs.com/package/fast-glob), [micromatch](https://www.npmjs.com/package/micromatch), and [js2xmlparser](https://www.npmjs.com/package/js2xmlparser).
- Resulting code should be much easier to reason with and maintain now.
- Add feature to output as text (one URL per line).
- Add verbose mode to see some console feedback.
- And finally, add tests with ~95% coverage.
**v0.2.0** - 2019-07-31:
- Allow BASEURL to be piped in also.
- Refactor some dependencies.
**v0.1.1** - 2019-07-27:
- Bugfix: properly check rootDir before replacing.
- Add new alias `sscli` because the original is quite a mouthful.
**v0.1.0** - 2019-07-26:
- Initial release.
- Built in 10 minutes. :stuck_out_tongue_winking_eye:
Changes are logged in the [Releases](https://github.com/zerodevx/static-sitemap-cli/releases) page.

@@ -1,14 +0,11 @@

const { Command, flags } = require('@oclif/command');
const getStdin = require('get-stdin');
const fg = require('fast-glob');
const mm = require('micromatch');
const parser = require('js2xmlparser');
const fs = require('fs');
const htmlparser = require('htmlparser2');
const getSitemap = require('./sitemap')
const { Command, flags } = require('@oclif/command')
const getStdin = require('get-stdin')
const fs = require('fs')
class StaticSitemapCliCommand extends Command {
async run() {
const { argv, flags } = this.parse(StaticSitemapCliCommand);
async run () {
const { argv, flags } = this.parse(StaticSitemapCliCommand)
let baseUrl = await getStdin();
let baseUrl = await getStdin()
if (!baseUrl) {

@@ -18,127 +15,30 @@ if (!argv.length) {

code: 'BASEURL_NOT_FOUND',
exit: 1,
});
exit: 1
})
}
baseUrl = argv[0];
baseUrl = argv[0]
}
const addSlash = (path) => (path.slice(-1) === '/' ? path : `${path}/`)
baseUrl = addSlash(baseUrl)
const addSlash = (path) => (path.slice(-1) === '/' ? path : `${path}/`);
baseUrl = addSlash(baseUrl);
const getUrl = (path) => {
let url = baseUrl + path;
if (!flags['no-clean']) {
if (url.slice(-11) === '/index.html') {
url = url.slice(0, -11);
} else if (url.slice(-5) === '.html') {
url = url.slice(0, -5);
}
let sitemap
try {
sitemap = await getSitemap(baseUrl, flags)
} catch (err) {
if (err.message === 'NO_MATCHES_FOUND') {
this.error('[static-sitemap-cli] No file matches found!', {
code: 'NO_MATCHES_FOUND',
exit: 1
})
} else {
this.error(err.toString(), { exit: 1 })
}
if (flags.slash || url.split('/').length === 3) {
url = url + '/';
}
return url;
};
const files = await fg(flags.match, {
cwd: flags.root,
stats: true,
});
if (files.length === 0) {
this.error('[static-sitemap-cli] no file matches found!', {
code: 'NO_MATCHES_FOUND',
exit: 1,
});
}
if (flags.verbose) {
console.warn('\x1b[36m%s\x1b[0m', `[static-sitemap-cli] found ${files.length} files!`);
for (let a = 0; a < files.length - 1; a++) {
console.warn('\x1b[36m%s\x1b[0m', `[static-sitemap-cli] -${files[a].path}`);
}
}
let sitemapText = '';
for (let a = 0; a < files.length - 1; a++) {
sitemapText += getUrl(files[a].path) + '\n';
}
sitemapText += getUrl(files[files.length - 1].path);
if (flags.text) {
this.log(sitemapText);
return;
}
let urls = [];
for (let a = 0; a < files.length; a++) {
let obj = {
loc: getUrl(files[a].path),
lastmod: files[a].stats.mtime.toISOString(),
};
if (flags['follow-noindex']) {
const fileContent = fs.readFileSync(flags.root + '/' + files[a].path);
let noindex = false;
const parsedHtml = new htmlparser.Parser({
onopentag(name, attrs) {
if (name === 'meta' && attrs.name === 'robots' && attrs.content === 'noindex') {
noindex = true;
parsedHtml.end();
}
},
});
parsedHtml.write(fileContent);
parsedHtml.end();
// No index meta tag
if (noindex) {
continue;
}
}
if (flags.priority) {
for (let b = 0; b < flags.priority.length; b++) {
if (mm.isMatch(files[a].path, flags.priority[b].split('=')[0])) {
obj.priority = parseFloat(flags.priority[b].split('=')[1]);
}
}
}
if (flags.changefreq) {
for (let b = 0; b < flags.changefreq.length; b++) {
if (mm.isMatch(files[a].path, flags.changefreq[b].split('=')[0])) {
obj.changefreq = flags.changefreq[b].split('=')[1];
}
}
}
urls.push(obj);
}
let sitemap = parser.parse(
'urlset',
{
'@': {
xmlns: 'http://www.sitemaps.org/schemas/sitemap/0.9',
},
url: [urls],
},
{
declaration: {
encoding: 'UTF-8',
},
format: {
doubleQuotes: true,
},
},
);
if (flags.save) {
let outputDir = flags['output-dir'] || flags.root;
fs.writeFileSync(`${addSlash(outputDir)}sitemap.xml`, `${sitemap}\n`, 'utf-8');
fs.writeFileSync(`${addSlash(outputDir)}sitemap.txt`, `${sitemapText}\n`, 'utf-8');
const outputDir = flags['output-dir'] || flags.root
fs.writeFileSync(`${addSlash(outputDir)}sitemap.xml`, `${sitemap.xml}\n`, 'utf-8')
fs.writeFileSync(`${addSlash(outputDir)}sitemap.txt`, `${sitemap.txt}\n`, 'utf-8')
} else {
this.log(sitemap);
this.log(sitemap)
}

@@ -149,3 +49,3 @@ }

StaticSitemapCliCommand.description = `
CLI to pre-generate XML sitemaps for static sites locally.
CLI to generate XML sitemaps for static sites from local filesystem.

@@ -155,3 +55,3 @@ At its most basic, just run from root of distribution:

CLI by default outputs to 'stdout'; BASEURL can be piped in via 'stdin'.`;
CLI by default outputs to 'stdout'; BASEURL can be piped in via 'stdin'.`

@@ -162,5 +62,5 @@ StaticSitemapCliCommand.args = [

required: false,
description: 'Base URL that is prefixed to all location entries.\nFor example: https://example.com/',
},
];
description: 'Base URL that is prefixed to all sitemap items.\nFor example: https://example.com/'
}
]

@@ -173,3 +73,3 @@ StaticSitemapCliCommand.flags = {

description: 'root working directory',
default: '.',
default: '.'
}),

@@ -179,4 +79,4 @@ match: flags.string({

multiple: true,
description: 'globs to match',
default: ['**/*.html', '!404.html'],
description: 'micromatch globs to match',
default: ['**/*.html', '!404.html']
}),

@@ -186,3 +86,3 @@ priority: flags.string({

multiple: true,
description: 'glob-priority pair [eg: foo/**=0.1]',
description: '`=`-separated glob-priority pair [eg: foo/**=0.1]'
}),

@@ -192,3 +92,3 @@ changefreq: flags.string({

multiple: true,
description: 'glob-changefreq pair (eg: bar/**=daily)',
description: '`=`-separated glob-changefreq pair [eg: bar/**=daily]'
}),

@@ -198,3 +98,3 @@ 'no-clean': flags.boolean({

description: 'disable clean URLs',
default: false,
default: false
}),

@@ -205,19 +105,19 @@ slash: flags.boolean({

default: false,
exclusive: ['no-clean'],
exclusive: ['no-clean']
}),
'follow-noindex': flags.boolean({
description: 'removes pages with noindex meta tag from sitemap (up to 5x slower due to reading and parsing every HTML file)',
default: false,
default: false
}),
text: flags.boolean({
char: 't',
description: 'output as .TXT instead',
description: 'output as text instead of XML',
default: false,
exclusive: ['priority', 'changefreq'],
exclusive: ['priority', 'changefreq']
}),
save: flags.boolean({
char: 's',
description: 'save output to XML and TXT files directly',
description: 'write both XML and TXT outputs to file directly instead of `stdout`',
default: false,
exclusive: ['text'],
exclusive: ['text']
}),

@@ -227,3 +127,3 @@ 'output-dir': flags.string({

description: 'specify the output dir; used together with --save; defaults to root working directory',
dependsOn: ['save'],
dependsOn: ['save']
}),

@@ -233,6 +133,6 @@ verbose: flags.boolean({

description: 'be more verbose',
default: false,
}),
};
default: false
})
}
module.exports = StaticSitemapCliCommand;
module.exports = StaticSitemapCliCommand

Sorry, the diff of this file is not supported yet

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc