You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP
Socket
Book a DemoInstallSign in
Socket

pdf-ghost

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

pdf-ghost

for performing a wide range of operations on PDF files, including merging, splitting, rotating, compressing, watermarking, converting, encrypting/decrypting, extracting text/images, adding page numbers, batch processing, and comparing PDFs. It also supports generating PDFs from Markdown or LaTeX files.

0.1.0
pipPyPI
Maintainers
1

PDF Ghost

PDF Ghost is a Python library designed for performing a wide range of operations on PDF files, including merging, splitting, rotating, compressing, watermarking, converting, encrypting/decrypting, extracting text/images, adding page numbers, batch processing, and comparing PDFs. It also supports generating PDFs from Markdown or LaTeX files.

Features

  • Merge PDFs: Combine multiple PDFs into a single file.
  • Split PDFs: Split a PDF into smaller files based on page ranges.
  • Remove Pages: Remove specific pages with page index, remove page from start and end.
  • Rotate Pages: Rotate all or specific pages in a PDF.
  • Insert Pages: Insert pages or specific pages in a PDF.
  • Rearrange Pages: Rearrange pages of a pdf file or merge pdf files and then rearrange all the pages.
  • Compress PDFs: Reduce the file size of a PDF by optimizing images and removing unnecessary metadata.
  • Watermarking: Add or remove text or image watermarks to PDFs.
  • Image to PDF: Covert Images to PDF file.
  • PDF to Image: Convert pages of a PDF file to images.
  • Encrypt/Decrypt PDFs: Add password protection to PDFs and decrypt them with the correct password.
  • Extract Text/Images: Extract text or images from a PDF.
  • Add Page Numbers: Insert page numbers at the bottom or top of each page.
  • Convert PDFs to HTML: Convert PDFs into structured HTML files.
  • Generate PDFs from Markdown/LaTeX: Convert Markdown or LaTeX files into well-formatted PDFs.
  • Compare PDFs: Identify differences between two PDF files.
  • PDF Signing: Add digital signatures to PDFs using cryptographic certificates.
  • Batch Processing: Apply operations (merge, split, rotate, etc.) on multiple PDFs at once.

Installation

Python Requirements

  • Python 3.7+

Install via pip

pip install pdfghost

External Dependencies

For Markdown-to-PDF and LaTeX-to-PDF conversion, the following external tools are required:

  • Pandoc: For converting Markdown to PDF.
  • BasicTeX: A lightweight LaTeX distribution for converting LaTeX to PDF.

Installing Pandoc

MacOS

If you have Homebrew installed, run:

brew install pandoc
Linux (Debian/Ubuntu)
sudo apt-get update
sudo apt-get install pandoc
Windows

Download the Pandoc installer from the official website here and follow the installation instructions.

Installing BasicTeX

MacOS
  • Download BasicTeX from here.
  • Install it by following the on-screen instructions.
  • Add the following to your .bashrc or .zshrc file:
    export PATH="/usr/local/texlive/2023/bin/universal-darwin:$PATH"
    
Linux (Debian/Ubuntu)
  • Install texlive (a full LaTeX distribution):
    sudo apt-get update
    sudo apt-get install texlive
    
Windows
  • Download and install MiKTeX (a lightweight LaTeX distribution) from here.
  • Follow the installation instructions.

Usage

Merge PDFs

from pdfghost import merge_pdfs

merge_pdfs("output.pdf", "file1.pdf", "file2.pdf")

Split PDF

from pdfghost import split_pdf

split_pdf("input.pdf", "output_folder", split_range=(0, 2))

Remove Specific Pages

from pdfghost import remove_pages

# Remove pages with indices 0, 2, and 4 (0-based)
remove_pages("input.pdf", "output.pdf", pages_to_remove=[0, 2, 4])

Remove Pages from Start

from pdfghost import remove_pages_from_start

# Remove the first 3 pages
remove_pages_from_start("input.pdf", "output.pdf", num_pages=3)

Remove Pages from End

from pdfghost import remove_pages_from_end

# Remove the last 2 pages
remove_pages_from_end("input.pdf", "output.pdf", num_pages=2)

Rotate Pages

from pdfghost import rotate_pdf

# Rotate all pages by 90 degrees
rotate_pdf("input.pdf", "output.pdf", rotation=90)

# Rotate specific pages by 180 degrees
rotate_pdf("input.pdf", "output.pdf", rotation=180, pages_to_rotate=[0, 2])

Insert Pages

from pdfghost import insert_pages

# Insert pages at specific positions
insertions = [
    (1, "insert1.pdf"),  # Insert pages from insert1.pdf at position 1
    (4, "insert2.pdf"),  # Insert pages from insert2.pdf at position 4
]
insert_pages("input.pdf", "output.pdf", insertions)

Rearrange Pages

from pdfghost import rearrange_pdf

# Rearrange pages in a PDF
page_order = [2, 0, 1]  # New order: Page 3, Page 1, Page 2
rearrange_pdf("input.pdf", "output.pdf", page_order)

Merge and Rearrange Pages

from pdfghost import merge_and_rearrange

# Merge multiple PDFs and rearrange their pages
page_order = [
    (0, 0),  # Page 1 from file1.pdf
    (1, 0),  # Page 1 from file2.pdf
    (0, 1),  # Page 2 from file1.pdf
]
merge_and_rearrange("output.pdf", page_order, "file1.pdf", "file2.pdf")

Compress PDF

from pdfghost import compress_pdf

# Compress a PDF with medium compression
compress_pdf("input.pdf", "output.pdf", power=3)

# Compress a PDF with maximum compression
compress_pdf("input.pdf", "output.pdf", power=5)

Add Text Watermark

from pdfghost import add_text_watermark

# Add a text watermark to all pages
add_text_watermark("input.pdf", "output.pdf", text="Confidential")

# Add a text watermark to specific pages
add_text_watermark("input.pdf", "output.pdf", text="Confidential", pages_to_watermark=[0, 2])

Add Image Watermark

from pdfghost import add_image_watermark

# Add an image watermark to all pages
add_image_watermark("input.pdf", "output.pdf", image_path="watermark.png")

# Add an image watermark to specific pages
add_image_watermark("input.pdf", "output.pdf", image_path="watermark.png", pages_to_watermark=[1])

Remove Watermark

from pdfghost import remove_watermark

# Remove watermarks from all pages
remove_watermark("input.pdf", "output.pdf")

# Remove watermarks from specific pages
remove_watermark("input.pdf", "output.pdf", pages_to_clean=[0, 2])

Convert PDF to Images

from pdfghost import pdf_to_images

# Convert each page of a PDF into PNG images
pdf_to_images("input.pdf", "output_folder", format="png")

# Convert each page of a PDF into JPG images
pdf_to_images("input.pdf", "output_folder", format="jpg")

Convert Images to PDF

from pdfghost import images_to_pdf

# Convert multiple image files into a single PDF
images_to_pdf("output.pdf", "image1.png", "image2.jpg")

Encrypt PDF

from pdfghost import encrypt_pdf

# Encrypt a PDF with a password
encrypt_pdf("input.pdf", "output.pdf", password="mypassword")

Decrypt PDF

from pdfghost import decrypt_pdf

# Decrypt a PDF with a password
decrypt_pdf("input.pdf", "output.pdf", password="mypassword")

Extract Text

from pdfghost import extract_text

# Extract text from a PDF and save it as a .txt file
extract_text("input.pdf", "output.txt", format="txt")

# Extract text from a PDF and save it as a .csv file
extract_text("input.pdf", "output.csv", format="csv")

Extract Images

from pdfghost import extract_images

# Extract all images from a PDF and save them as separate image files
extract_images("input.pdf", "output_folder")

Add Page Numbers

from pdfghost import add_page_numbers

# Add page numbers at the bottom of each page
add_page_numbers("input.pdf", "output.pdf", position="bottom")

# Add page numbers at the top of each page
add_page_numbers("input.pdf", "output.pdf", position="top")

Convert PDF to HTML

from pdfghost import pdf_to_html

# Convert a PDF into a structured HTML file
pdf_to_html("input.pdf", "output.html")

Convert Markdown to PDF

from pdfghost import markdown_to_pdf

# Convert a Markdown file into a PDF
markdown_to_pdf("input.md", "output.pdf")

Convert LaTeX to PDF

from pdfghost import latex_to_pdf

# Convert a LaTeX file into a PDF
latex_to_pdf("input.tex", "output.pdf")

Compare PDFs

from pdfghost import compare_pdfs

# Compare two PDFs and generate a summary of differences
result = compare_pdfs("file1.pdf", "file2.pdf", output_type="summary")
print(result)

# Compare two PDFs with side-by-side output
result = compare_pdfs("file1.pdf", "file2.pdf", output_type="side_by_side")
print(result)

# Compare two PDFs with highlighted differences
result = compare_pdfs("file1.pdf", "file2.pdf", output_type="highlight_differences")
print(result)

# Compare two PDFs with version control-style output
result = compare_pdfs("file1.pdf", "file2.pdf", output_type="version_control")
print(result)

# Compare two PDFs with annotations
result = compare_pdfs("file1.pdf", "file2.pdf", output_type="annotations")
print(result)

Sign PDFs

from pdfghost import sign_pdf

# Sign a PDF with a cryptographic certificate
sign_pdf("input.pdf", "signed.pdf", "certificate.pfx", password="mypassword")

Batch Processing

from pdfghost import batch_process, rotate_pdf

# Rotate all PDFs in a folder by 90 degrees
batch_process("input_folder", "output_folder", rotate_pdf, rotation=90)

Testing

To run unit tests, first install the development dependencies, and then use:

python -m unittest discover tests/

Contributing

  • Fork the repository.
  • Create your feature branch (git checkout -b feature/your-feature).
  • Commit your changes (git commit -am 'Add new feature').
  • Push to the branch (git push origin feature/your-feature).
  • Open a new Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

About

Packages

Stay in touch

Get open source security insights delivered straight into your inbox.

  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc

U.S. Patent No. 12,346,443 & 12,314,394. Other pending.