Socket
Socket
Sign inDemoInstall

stream-sitemap-parser

Package Overview
Dependencies
36
Maintainers
5
Versions
34
Alerts
File Explorer

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

    stream-sitemap-parser

Receive any type of sitemap stream and parse it. Stream back list of URLs or errors found


Version published
Weekly downloads
26
increased by73.33%
Maintainers
5
Install size
5.64 MB
Created
Weekly downloads
 

Readme

Source

sitemap-parser

Stream a sitemap file and get back a stream of URLs or any error found while parsing the file.

Usage

const { fetch, verify, getRules } = require('stream-sitemap-parser');

fs.createReadStream(file)
  .pipe(fetch())
  .on('data', function (url) {
    // each chunk now contains an url and all its given atributes
    {
      loc: 'www.google.com',
      lastmod: '2017-01-01T00:00:00.000Z',
      changefreq: 'monthly',
      priority: '0.8',
      alternate: [
        {
          href: 'https://www.google.com/es/',
          hreflang: 'es'
        }
      ]
    }
  })

verify(fs.createReadStream(file))
  .then(result => {
    // result will be an object containing information about any warning or error found while parsing the sitemap
    {
      messages: [
        {
          type: 'tooManyTags',
          details: {
            parent: 'url',
            tag: 'loc'
          }
        }
      ],
      alternates: [
        {
          loc: 'https://www.google.com',
          alternate: [
            {
              href: 'https://www.google.com/es/',
              hreflang: 'es'
            }
          ]
      ]
    }
  })

getRules();
// returns an object of all loaded rules of the parser

fetch and verify can take several options.

fetch ( { contentType, domain, maxSize, maxUrls } )

verify (sitemapStream, { contentType, domain, maxSize, maxUrls } )

contentType will be by default xml. Set it to txt when streaming that data type.

domain will be by default null. Set it to a given domain to make sure that the URLs parsed will have the same domain.

maxSize will be by default 50MB. Set it to any given size to make sure that the stream can't have a larger size than this.

maxUrls will be by default 50000. Set it to any given value to make sure that no more URLs will be parsed.

FAQs

Last updated on 07 Jun 2022

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc