Socket
Socket
Sign inDemoInstall

mapsite

Package Overview
Dependencies
19
Maintainers
1
Versions
22
Alerts
File Explorer

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

    mapsite

A module to parse urls from a local or remote sitemap.xml


Version published
Weekly downloads
24
decreased by-74.47%
Maintainers
1
Created
Weekly downloads
 

Readme

Source

Note: Version 2 of this package may differ in results from version 1.x. Mainly because the parser is now using Cheerio

Getting Started

npm install mapsite

or

yarn add mapsite

Usage

const { SitemapParser } = require("mapsite");

const options = {
  rejectInvalidContentType: true,
  userAgent: "customUA",
  maximumRetries: 1,
  maximumDepth: 5,
  timeout: 3000,
  debug: false,
};

const parser = new SitemapParser(options);

With proxy

const { SitemapParser } = require("mapsite");

const parser = new SitemapParser({
  proxy: 'https://username:password@proxy.host:3000'
});

options

All options are optional, with default fallbacks encoded.

rejectInvalidContentType: boolean;

Checks that the response content-type header MUST be:

  • application/xml
  • application/rss+xml
  • text/xml

default: true


userAgent: string;

Adds a custom User-Agent string to the requests.

default: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36 mapsite/1.0


maximumRetries: number;

How many times a url in the <loc> tag of an XML index file should be requested when response status is not < 400.

default: 1


maximumDepth: string;

How many levels deep should XML index files be traversed. E.g. if index files are nested 3 levels and maximum depth is 2. The last response will not crawl the URLs in the <loc> tag further.

default: 2


timeout: number;

The number of milliseconds allowed for a request to complete, both headers or body will timeout at this point.


debug: boolean;

Logs info, warning and error messages as the parser runs (WIP).



proxy: string;

A URL of a proxy server to proxy the request through.


Methods

run

const parser = new SitemapParser();
const result = await parser.run("https://example.com/sitemap.xml");

result: MapsiteResponse;

The result shape looks as follows:

const result = {
  type: "sitemap",
  urls: ["https://example.com"],
  errors: [
    {
      url: "https://example.com/sitemap-index.xml",
      reason: "Brief description of what went wrong",
    },
  ],
};

fromBuffer

const { readFileSync } = require("fs");
const parser = new SitemapParser();
const buffer = Buffer.from(readFileSync("./sitemap.xml")); // Or a buffer from an uploaded file
const result = await parser.fromBuffer(buffer);

result: MapsiteResponse;

The result shape looks as follows:

const result = {
  type: "sitemap", // or 'index'
  urls: ["https://example.com"],
  errors: [
    {
      url: "buffer",
      reason: "Brief description of what went wrong",
    },
  ],
};

Keywords

FAQs

Last updated on 20 Apr 2024

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc