
Research
/Security News
60 Malicious Ruby Gems Used in Targeted Credential Theft Campaign
A RubyGems malware campaign used 60 malicious packages posing as automation tools to steal credentials from social media and marketing tool users.
for performing a wide range of operations on PDF files, including merging, splitting, rotating, compressing, watermarking, converting, encrypting/decrypting, extracting text/images, adding page numbers, batch processing, and comparing PDFs. It also supports generating PDFs from Markdown or LaTeX files.
PDF Ghost
is a Python library designed for performing a wide range of operations on PDF files, including merging,
splitting, rotating, compressing, watermarking, converting, encrypting/decrypting, extracting text/images, adding page
numbers, batch processing, and comparing PDFs. It also supports generating PDFs from Markdown or LaTeX files.
pip install pdfghost
For Markdown-to-PDF and LaTeX-to-PDF conversion, the following external tools are required:
If you have Homebrew installed, run:
brew install pandoc
sudo apt-get update
sudo apt-get install pandoc
Download the Pandoc installer from the official website here and follow the installation instructions.
.bashrc
or .zshrc
file:
export PATH="/usr/local/texlive/2023/bin/universal-darwin:$PATH"
texlive
(a full LaTeX distribution):
sudo apt-get update
sudo apt-get install texlive
from pdfghost import merge_pdfs
merge_pdfs("output.pdf", "file1.pdf", "file2.pdf")
from pdfghost import split_pdf
split_pdf("input.pdf", "output_folder", split_range=(0, 2))
from pdfghost import remove_pages
# Remove pages with indices 0, 2, and 4 (0-based)
remove_pages("input.pdf", "output.pdf", pages_to_remove=[0, 2, 4])
from pdfghost import remove_pages_from_start
# Remove the first 3 pages
remove_pages_from_start("input.pdf", "output.pdf", num_pages=3)
from pdfghost import remove_pages_from_end
# Remove the last 2 pages
remove_pages_from_end("input.pdf", "output.pdf", num_pages=2)
from pdfghost import rotate_pdf
# Rotate all pages by 90 degrees
rotate_pdf("input.pdf", "output.pdf", rotation=90)
# Rotate specific pages by 180 degrees
rotate_pdf("input.pdf", "output.pdf", rotation=180, pages_to_rotate=[0, 2])
from pdfghost import insert_pages
# Insert pages at specific positions
insertions = [
(1, "insert1.pdf"), # Insert pages from insert1.pdf at position 1
(4, "insert2.pdf"), # Insert pages from insert2.pdf at position 4
]
insert_pages("input.pdf", "output.pdf", insertions)
from pdfghost import rearrange_pdf
# Rearrange pages in a PDF
page_order = [2, 0, 1] # New order: Page 3, Page 1, Page 2
rearrange_pdf("input.pdf", "output.pdf", page_order)
from pdfghost import merge_and_rearrange
# Merge multiple PDFs and rearrange their pages
page_order = [
(0, 0), # Page 1 from file1.pdf
(1, 0), # Page 1 from file2.pdf
(0, 1), # Page 2 from file1.pdf
]
merge_and_rearrange("output.pdf", page_order, "file1.pdf", "file2.pdf")
from pdfghost import compress_pdf
# Compress a PDF with medium compression
compress_pdf("input.pdf", "output.pdf", power=3)
# Compress a PDF with maximum compression
compress_pdf("input.pdf", "output.pdf", power=5)
from pdfghost import add_text_watermark
# Add a text watermark to all pages
add_text_watermark("input.pdf", "output.pdf", text="Confidential")
# Add a text watermark to specific pages
add_text_watermark("input.pdf", "output.pdf", text="Confidential", pages_to_watermark=[0, 2])
from pdfghost import add_image_watermark
# Add an image watermark to all pages
add_image_watermark("input.pdf", "output.pdf", image_path="watermark.png")
# Add an image watermark to specific pages
add_image_watermark("input.pdf", "output.pdf", image_path="watermark.png", pages_to_watermark=[1])
from pdfghost import remove_watermark
# Remove watermarks from all pages
remove_watermark("input.pdf", "output.pdf")
# Remove watermarks from specific pages
remove_watermark("input.pdf", "output.pdf", pages_to_clean=[0, 2])
from pdfghost import pdf_to_images
# Convert each page of a PDF into PNG images
pdf_to_images("input.pdf", "output_folder", format="png")
# Convert each page of a PDF into JPG images
pdf_to_images("input.pdf", "output_folder", format="jpg")
from pdfghost import images_to_pdf
# Convert multiple image files into a single PDF
images_to_pdf("output.pdf", "image1.png", "image2.jpg")
from pdfghost import encrypt_pdf
# Encrypt a PDF with a password
encrypt_pdf("input.pdf", "output.pdf", password="mypassword")
from pdfghost import decrypt_pdf
# Decrypt a PDF with a password
decrypt_pdf("input.pdf", "output.pdf", password="mypassword")
from pdfghost import extract_text
# Extract text from a PDF and save it as a .txt file
extract_text("input.pdf", "output.txt", format="txt")
# Extract text from a PDF and save it as a .csv file
extract_text("input.pdf", "output.csv", format="csv")
from pdfghost import extract_images
# Extract all images from a PDF and save them as separate image files
extract_images("input.pdf", "output_folder")
from pdfghost import add_page_numbers
# Add page numbers at the bottom of each page
add_page_numbers("input.pdf", "output.pdf", position="bottom")
# Add page numbers at the top of each page
add_page_numbers("input.pdf", "output.pdf", position="top")
from pdfghost import pdf_to_html
# Convert a PDF into a structured HTML file
pdf_to_html("input.pdf", "output.html")
from pdfghost import markdown_to_pdf
# Convert a Markdown file into a PDF
markdown_to_pdf("input.md", "output.pdf")
from pdfghost import latex_to_pdf
# Convert a LaTeX file into a PDF
latex_to_pdf("input.tex", "output.pdf")
from pdfghost import compare_pdfs
# Compare two PDFs and generate a summary of differences
result = compare_pdfs("file1.pdf", "file2.pdf", output_type="summary")
print(result)
# Compare two PDFs with side-by-side output
result = compare_pdfs("file1.pdf", "file2.pdf", output_type="side_by_side")
print(result)
# Compare two PDFs with highlighted differences
result = compare_pdfs("file1.pdf", "file2.pdf", output_type="highlight_differences")
print(result)
# Compare two PDFs with version control-style output
result = compare_pdfs("file1.pdf", "file2.pdf", output_type="version_control")
print(result)
# Compare two PDFs with annotations
result = compare_pdfs("file1.pdf", "file2.pdf", output_type="annotations")
print(result)
from pdfghost import sign_pdf
# Sign a PDF with a cryptographic certificate
sign_pdf("input.pdf", "signed.pdf", "certificate.pfx", password="mypassword")
from pdfghost import batch_process, rotate_pdf
# Rotate all PDFs in a folder by 90 degrees
batch_process("input_folder", "output_folder", rotate_pdf, rotation=90)
To run unit tests, first install the development dependencies, and then use:
python -m unittest discover tests/
git checkout -b feature/your-feature
).git commit -am 'Add new feature'
).git push origin feature/your-feature
).This project is licensed under the MIT License - see the LICENSE file for details.
FAQs
for performing a wide range of operations on PDF files, including merging, splitting, rotating, compressing, watermarking, converting, encrypting/decrypting, extracting text/images, adding page numbers, batch processing, and comparing PDFs. It also supports generating PDFs from Markdown or LaTeX files.
We found that pdf-ghost demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
/Security News
A RubyGems malware campaign used 60 malicious packages posing as automation tools to steal credentials from social media and marketing tool users.
Security News
The CNA Scorecard ranks CVE issuers by data completeness, revealing major gaps in patch info and software identifiers across thousands of vulnerabilities.
Research
/Security News
Two npm packages masquerading as WhatsApp developer libraries include a kill switch that deletes all files if the phone number isn’t whitelisted.