
Security News
New CVE Forecasting Tool Predicts 47,000 Disclosures in 2025
CVEForecast.org uses machine learning to project a record-breaking surge in vulnerability disclosures in 2025.
A Python library for converting PDF documents and images to Markdown format with AI assistance. Shift from scanned documents and images to editable, searchable text.
pip install papershift
from papershift import convert_pdf_to_markdown
# Basic usage
markdown_content = convert_pdf_to_markdown(
pdf_path="path/to/your/document.pdf",
api_key="your-openrouter-api-key"
)
# Advanced usage with options
markdown_content = convert_pdf_to_markdown(
pdf_path="path/to/your/document.pdf",
output_dir="output_folder",
dpi=300,
target_height_px=2048,
model="openrouter/google/gemini-2.0-flash-001",
api_key="your-openrouter-api-key",
max_workers=4,
batch_size=5,
fast_mode=True
)
# Save the output
with open("output.md", "w", encoding="utf-8") as f:
f.write(markdown_content)
from papershift import convert_image_to_markdown, convert_images_to_markdown
# Convert a single image
markdown_content = convert_image_to_markdown(
image_path="path/to/your/image.jpg",
api_key="your-openrouter-api-key"
)
# Convert multiple images with combined output
markdown_content = convert_images_to_markdown(
image_paths=["image1.jpg", "image2.png", "image3.jpg"],
output_dir="output_folder",
api_key="your-openrouter-api-key",
combined_output=True
)
# Convert multiple images with separate outputs
markdown_files = convert_images_to_markdown(
image_paths=["image1.jpg", "image2.png", "image3.jpg"],
output_dir="output_folder",
api_key="your-openrouter-api-key",
combined_output=False
)
Parameter | Description | Default |
---|---|---|
pdf_path | Path to the PDF file | (Required) |
output_dir | Directory to save the output markdown files | None |
dpi | DPI for image rendering | 300 |
target_height_px | Target height in pixels | 2048 |
aspect_threshold | Aspect ratio threshold for height adjustment | 1.5 |
prompt | Text prompt to send with each page image | "Convert this document to markdown" |
model | The model to use for processing | "openrouter/google/gemini-2.0-flash-001" |
api_key | OpenRouter API key | None |
site_url | Optional site URL for OpenRouter | None |
app_name | Optional app name for OpenRouter | None |
combined_output | If True, returns a single string with all pages combined | True |
verbose | If True, prints progress information | False |
max_workers | Maximum number of worker processes for PDF conversion | 4 |
batch_size | Number of pages to process in a single batch | 5 |
quality | Image quality (1-100) for JPEG compression in fast mode | 95 |
fast_mode | If True, uses reduced resolution and JPEG format for faster processing | False |
Parameter | Description | Default |
---|---|---|
image_path / image_paths | Path to the image file or list of image paths | (Required) |
output_dir | Directory to save the output markdown files | None |
target_height_px | Target height in pixels | 2048 |
aspect_threshold | Aspect ratio threshold for height adjustment | 1.5 |
prompt | Text prompt to send with each image | "Convert this image to markdown" |
model | The model to use for processing | "openrouter/google/gemini-2.0-flash-001" |
api_key | OpenRouter API key | None |
site_url | Optional site URL for OpenRouter | None |
app_name | Optional app name for OpenRouter | None |
combined_output | If True, returns a single string with all images combined | True |
verbose | If True, prints progress information | False |
max_workers | Maximum number of worker processes for parallel processing | 4 |
quality | Image quality (1-100) for JPEG compression in fast mode | 95 |
fast_mode | If True, uses reduced resolution and JPEG format for faster processing | False |
FAQs
Convert PDF documents and images to Markdown format with AI assistance
We found that papershift demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
CVEForecast.org uses machine learning to project a record-breaking surge in vulnerability disclosures in 2025.
Security News
Browserslist-rs now uses static data to reduce binary size by over 1MB, improving memory use and performance for Rust-based frontend tools.
Research
Security News
Eight new malicious Firefox extensions impersonate games, steal OAuth tokens, hijack sessions, and exploit browser permissions to spy on users.