🚀 Big News: Socket Acquires Coana to Bring Reachability Analysis to Every Appsec Team.Learn more
Socket
Sign inDemoInstall
Socket

pdf-to-markdown-llm

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

pdf-to-markdown-llm

This project contains a command line tool to convert PDF to markdown. It uses image conversion and a LLM to convert the images to markdown.

0.1.12
PyPI
Maintainers
1

PDF to Markdown

This project contains a command line tool to convert PDF and Word documents to markdown. It uses image conversion and an LLM to convert the images to markdown.

Install

Execute these commands in the base directory of this project.

On Windows download the poppler library (e.g. poppler-24.08.0) from here and then do this using PowerShell:

$env:PKG_CONFIG_PATH="<download_folder>\poppler-24.08.0\Library\lib\pkgconfig"
# conda remove -n pdf_to_markdown --all
uv venv
# .venv\Scripts\activate
source .venv/bin/activate
uv sync
# Windows
pip install cmake
# End Windows
# Linux
sudo apt update
sudo apt install g++ -y
sudo apt install pkg-config -y
sudo apt-get install poppler-utils libpoppler-cpp-dev
# End Linux

There is an installation script for Linux in this repository.

Configuration

The application is configured used environment variables which you can set in an .env file. Check the .env_local file for the names of the variables that you will need.

You will need an Open AI key to run the PDF conversion.

You will also need a Gemini API key.

So you will need two environment variables:

OPENAI_API_KEY GEMINI_API_KEY

Usage of the command line application

Example: how to convert multiple pdf files with the OpenAI engine:

python ./pdf_to_markdown_llm/main/cli.py convert-files -f ./pdfs/oecd/002b3a39-en.pdf -f ./pdfs/oecd/ee6587fd-en.pdf

Example: how to convert a Word file to markdown with the OpenAI engine:

python ./pdf_to_markdown_llm/main/cli.py convert-files -f "./docs/Explainability March 2025.docx"

Example: how to convert a Word file to html with the OpenAI engine:

python ./pdf_to_markdown_llm/main/cli.py convert-files -f "./docs/bk/Pour INSCRIPTION en ligne MARCORIGNAN .docx" -t html

Example: how to convert a single file with Gemini model:

python ./pdf_to_markdown_llm/main/cli.py convert-files -f ./pdfs/oecd/002b3a39-en.pdf -e gemini

Example: how to convert all pdf files in a folder:

python ./pdf_to_markdown_llm/main/cli.py convert-in-dir --dirs ./pdfs/oecd

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts