
Security News
Axios Maintainer Confirms Social Engineering Attack Behind npm Compromise
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.
@dataxquare/sitemapper
Advanced tools
Parser for XML Sitemaps to be used with Robots.txt and web crawlers
Sitemapper is a Node.js module that makes it easy to parse XML sitemaps. It supports single sitemaps, sitemap indexes with multiple sitemaps, and various sitemap formats including image and video sitemaps.
# Using npm
npm install sitemapper --save
# Using yarn
yarn add sitemapper
# Using pnpm
pnpm add sitemapper
import Sitemapper from 'sitemapper';
const sitemap = new Sitemapper({
timeout: 10000, // 10 second timeout
});
sitemap
.fetch('https://gosla.sh/sitemap.xml')
.then(({ url, sites }) => {
console.log('Sites: ', sites);
})
.catch((error) => console.error(error));
You can also use Sitemapper directly from the command line:
# Using npx
npx sitemapper https://gosla.sh/sitemap.xml
import Sitemapper from 'sitemapper';
const sitemap = new Sitemapper();
sitemap
.fetch('https://wp.seantburke.com/sitemap.xml')
.then(({ url, sites }) => {
console.log(`Sitemap URL: ${url}`);
console.log(`Found ${sites.length} URLs`);
console.log(sites);
})
.catch((error) => console.error(error));
import Sitemapper from 'sitemapper';
async function parseSitemap() {
const Google = new Sitemapper({
url: 'https://www.google.com/work/sitemap.xml',
timeout: 15000, // 15 seconds
concurrency: 10,
});
try {
const { sites } = await Google.fetch();
console.log(`Found ${sites.length} URLs in the sitemap`);
console.log(sites);
} catch (error) {
console.error('Error fetching sitemap:', error);
}
}
parseSitemap();
import Sitemapper from 'sitemapper';
import { HttpsProxyAgent } from 'hpagent';
const sitemapper = new Sitemapper({
url: 'https://gosla.sh/sitemap.xml',
timeout: 30000,
concurrency: 5,
retries: 2,
debug: true,
proxyAgent: new HttpsProxyAgent({
proxy: 'http://localhost:8080',
}),
requestHeaders: {
'User-Agent': 'Mozilla/5.0 (compatible; SitemapperBot/1.0)',
},
fields: {
loc: true,
lastmod: true,
sitemap: true,
},
});
sitemapper
.fetch()
.then(({ sites }) => console.log(sites))
.catch((error) => console.error(error));
Sitemapper can be customized with the following options:
| Option | Type | Default | Description |
|---|---|---|---|
url | String | undefined | The URL of the sitemap to parse |
timeout | Number | 15000 | Maximum timeout in milliseconds for each request |
concurrency | Number | 10 | Maximum number of concurrent requests when crawling multiple sitemaps |
retries | Number | 0 | Number of retry attempts for failed requests |
debug | Boolean | false | Enable debug logging |
rejectUnauthorized | Boolean | true | Reject invalid SSL certificates (like self-signed or expired) |
requestHeaders | Object | {} | Additional HTTP headers to include with requests |
lastmod | Number | undefined | Only return URLs with lastmod timestamp newer than this value |
proxyAgent | HttpProxyAgent | HttpsProxyAgent | undefined | Instance of hpagent for proxy support |
exclusions | Array<RegExp> | [] | Array of regex patterns to exclude URLs from results |
fields | Object | undefined | Specify which fields to include in the results (see below) |
Important: When using the fields option, the return format changes from an array of URL strings to an array of objects containing your selected fields.
For the fields option, specify which fields to include by setting them to true:
| Field | Description |
|---|---|
loc | URL location of the page |
sitemap | URL of the sitemap containing this URL (useful for sitemap indexes) |
lastmod | Date of last modification |
changefreq | How frequently the page is likely to change |
priority | Priority of this URL relative to other URLs |
image:loc | URL location of the image (for image sitemaps) |
image:title | Title of the image (for image sitemaps) |
image:caption | Caption of the image (for image sitemaps) |
video:title | Title of the video (for video sitemaps) |
video:description | Description of the video (for video sitemaps) |
video:thumbnail_loc | Thumbnail URL of the video (for video sitemaps) |
// Returns an array of URL strings
[
'https://wp.seantburke.com/?p=234',
'https://wp.seantburke.com/?p=231',
'https://wp.seantburke.com/?p=185',
];
// Returns an array of objects
[
{
loc: 'https://wp.seantburke.com/?p=234',
lastmod: '2015-07-03T02:05:55+00:00',
priority: 0.8,
},
{
loc: 'https://wp.seantburke.com/?p=231',
lastmod: '2015-07-03T01:47:29+00:00',
priority: 0.8,
},
];
Sitemapper includes a simple CLI tool for basic sitemap parsing directly from the command line:
npx sitemapper <sitemap-url>
npx sitemapper https://gosla.sh/sitemap.xml
The CLI will display the sitemap URL and list all URLs found in the sitemap:
Sitemap URL: https://gosla.sh/sitemap.xml
Found URLs:
1. https://gosla.sh/page1
2. https://gosla.sh/page2
3. https://gosla.sh/page3
...
Currently, the CLI supports the --timeout parameter to set the request timeout in milliseconds:
npx sitemapper https://gosla.sh/sitemap.xml --timeout=5000
Note: The CLI implementation is basic and does not yet support all options available in the JavaScript API. More advanced features like fields filtering, concurrency control, and different output formats require using the JavaScript API directly.
Contributions from experienced engineers are highly valued. When contributing, please consider:
npm install with the latest NPM version to update package-lock.jsonnpm test locally to verify your changes pass the test
For substantial changes, consider opening an issue for discussion before implementation.
Note: The CI pipeline enforces TypeScript type checking, linting rules, formatting standards, and test coverage thresholds.
This project is licensed under the MIT License - see the LICENSE file for details.
FAQs
Parser for XML Sitemaps to be used with Robots.txt and web crawlers
We found that @dataxquare/sitemapper demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.

Security News
Node.js has paused its bug bounty program after funding ended, removing payouts for vulnerability reports but keeping its security process unchanged.

Security News
The Axios compromise shows how time-dependent dependency resolution makes exposure harder to detect and contain.