Sitemap 2 Doc
This module downloads all web pages listed in the Sitemap.xml file and compiles them into a single document.
Designed for AI Embedding Generation
Quickstart
Terminal
npm init -y && npm i sitemap2doc
Node
index.mjs
import { Sitemap2Doc } from 'sitemap2doc'
const s2d = new Sitemap2Doc()
await s2d.getDocument( {
'projectName': 'test',
'sitemapUrl': 'https://...'
} )
Terminal
node index.mjs
Table of Contents
Methods
getDocument()
Key | Type | Description | Required | Default |
---|
projectName | String | Set project name | true | |
sitemapUrl | String | Set sitemap source | true | |
silent | Boolean | Control terminal output | false | false |
Example
import { Sitemap2Doc } from 'sitemap2doc'
const s2d = new Sitemap2Doc()
await s2d.getDocument( {
'projectName': 'test',
'sitemapUrl': 'https://...'
} )
Get Sitemap https://...
Get Pages 0 1 2 3 4 5 6 7 8 9
Merge 0
getConfig()
Get current config, the default config you can find here: ./src/data/config.mjs
import { Sitemap2Doc } from 'sitemap2doc'
const s2d = new Sitemap2Doc()
let config = s2d.getConfig()
config['download']['chunkSize'] = 4
s2d
.setConfig( { config } )
.getDocument( { ... } )
setConfig()
All module settings are stored in a config file, see ./src/data/config.mjs. This file can be completely overridden by passing an object during initialization.
import { Sitemap2Doc } from 'sitemap2doc'
const s2d = new Sitemap2Doc()
let config = s2d.getConfig()
config['download']['chunkSize'] = 4
s2d
.setConfig( { config } )
.getDocument( { ... } )
License
The module is available as open source under the terms of the Apache 2.0. License.