New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More
Socket
Sign inDemoInstall
Socket

@fnet/sitemap-to-pdf

Package Overview
Dependencies
Maintainers
0
Versions
8
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@fnet/sitemap-to-pdf

This project provides a straightforward utility to convert the contents of a website's sitemap into PDF documents. By accessing the URLs listed in a sitemap, it generates PDFs of each page, which can be bundled together or saved as separate files. This to

  • 0.1.8
  • latest
  • Source
  • npm
  • Socket score

Version published
Weekly downloads
1
Maintainers
0
Weekly downloads
 
Created
Source

@fnet/sitemap-to-pdf

This project provides a straightforward utility to convert the contents of a website's sitemap into PDF documents. By accessing the URLs listed in a sitemap, it generates PDFs of each page, which can be bundled together or saved as separate files. This tool is particularly useful for creating physical or offline copies of a website's content for archival or review purposes.

How It Works

Using the sitemap URL you provide, the tool fetches all the accessible links within that sitemap. It visits each of these pages and produces PDFs of their content. You have the option to bundle all pages into a single PDF file, or keep them as individual files based on your preference. Additionally, you can limit the number of pages to process or create bundled PDFs with a specified maximum size.

Key Features

  • Process URLs from a provided sitemap to generate PDFs.
  • Option to save each page as a separate PDF or combine them into one.
  • Capability to set a size limit for bundled PDFs, splitting larger collections into manageable parts.
  • Ability to limit the number of links processed, based on user specifications.
  • Automatically handles URL sanitization for file names.

Conclusion

The @fnet/sitemap-to-pdf project offers a practical solution for converting web pages into PDF format using a sitemap as the source. Whether you need a bundled document or distinct files for each page, this tool simplifies the task of generating offline versions of web content.

Developer Guide for @fnet/sitemap-to-pdf

Overview

The @fnet/sitemap-to-pdf library provides a convenient way to crawl a sitemap, extract webpage links, and generate PDFs of those pages. You can choose to either create a single PDF file by combining pages or generate individual PDFs for each webpage. This library is particularly useful for archiving purposes or for compiling web content into a neatly packaged PDF format.

Installation

To use @fnet/sitemap-to-pdf, you need to install it via npm or yarn. You can add it to your project by running one of the following commands:

Using npm:

npm install @fnet/sitemap-to-pdf

Using yarn:

yarn add @fnet/sitemap-to-pdf

Usage

The core functionality of this library revolves around the index function, which takes various parameters to configure the crawling and PDF generation process.

Function Signature

import sitemapToPdf from '@fnet/sitemap-to-pdf';

sitemapToPdf({
  sitemapUrl: '<SITEMAP_URL>',
  outputDirectory: '<OUTPUT_DIRECTORY>',
  bundle: true, // Optional: Defaults to true
  outputFile: 'output', // Optional: Default output file name
  bundleSize: Infinity, // Optional: Size limit for each PDF bundle in MB
  limit: Infinity // Optional: Max number of pages to process
});

Parameters

  • sitemapUrl: The URL of the sitemap you wish to crawl.
  • outputDirectory: The directory where the generated PDFs will be stored.
  • bundle: (Optional) Boolean value indicating whether to combine pages into a single PDF. Defaults to true.
  • outputFile: (Optional) The base name for the bundled PDF file(s). Defaults to 'output'.
  • bundleSize: (Optional) Maximum size for each bundle PDF in MB. Defaults to Infinity, which means no size limit.
  • limit: (Optional) The maximum number of links to process from the sitemap.

Examples

Here are some practical examples to help demonstrate common use cases:

Example 1: Generate a Single Bundled PDF

import sitemapToPdf from '@fnet/sitemap-to-pdf';

sitemapToPdf({
  sitemapUrl: 'https://example.com/sitemap.xml',
  outputDirectory: './pdfs',
  bundle: true,
  outputFile: 'example-site',
  bundleSize: 10 // Limit each PDF to 10 MB
});

This example generates PDFs from a sitemap and bundles them into one or more PDF files with a limited size of 10 MB per file.

Example 2: Generate Separate PDFs for Each Page

import sitemapToPdf from '@fnet/sitemap-to-pdf';

sitemapToPdf({
  sitemapUrl: 'https://example.com/sitemap.xml',
  outputDirectory: './pdfs',
  bundle: false // Generate separate PDFs for each webpage
});

In this scenario, each page is saved as a separate PDF file in the specified output directory.

Acknowledgement

This library uses several key technologies to perform its functions, such as Puppeteer for page rendering and pdf-lib for PDF manipulation. Special thanks to contributors and maintainers of these libraries that make this tool possible.

Input Schema

$schema: https://json-schema.org/draft/2020-12/schema
type: object
properties:
  sitemapUrl:
    type: string
    description: The URL of the sitemap to crawl.
  outputDirectory:
    type: string
    description: The directory where the PDFs will be saved.
  bundle:
    type: boolean
    description: Whether to combine all pages into a single PDF. Defaults to true.
  outputFile:
    type: string
    description: The base name of the bundled PDF file(s).
  bundleSize:
    type: number
    description: Maximum size of each bundled PDF in MB.
  limit:
    type: number
    description: Maximum number of links to process. Defaults to all links.
required:
  - sitemapUrl
  - outputDirectory

FAQs

Package last updated on 08 Nov 2024

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc