
Security News
MCP Community Begins Work on Official MCP Metaregistry
The MCP community is launching an official registry to standardize AI tool discovery and let agents dynamically find and install MCP servers.
This project contains a command line tool to convert PDF to markdown. It uses image conversion and a LLM to convert the images to markdown.
This project contains a command line tool to convert PDF and Word documents to markdown. It uses image conversion and an LLM to convert the images to markdown.
Execute these commands in the base directory of this project.
On Windows download the poppler library (e.g. poppler-24.08.0) from here and then do this using PowerShell:
$env:PKG_CONFIG_PATH="<download_folder>\poppler-24.08.0\Library\lib\pkgconfig"
# conda remove -n pdf_to_markdown --all
uv venv
# .venv\Scripts\activate
source .venv/bin/activate
uv sync
# Windows
pip install cmake
# End Windows
# Linux
sudo apt update
sudo apt install g++ -y
sudo apt install pkg-config -y
sudo apt-get install poppler-utils libpoppler-cpp-dev
# End Linux
There is an installation script for Linux in this repository.
The application is configured used environment variables which you can set in an .env
file. Check the .env_local file for the names of the variables that you will need.
You will need an Open AI key to run the PDF conversion.
You will also need a Gemini API key.
So you will need two environment variables:
OPENAI_API_KEY GEMINI_API_KEY
Example: how to convert multiple pdf files with the OpenAI engine:
python ./pdf_to_markdown_llm/main/cli.py convert-files -f ./pdfs/oecd/002b3a39-en.pdf -f ./pdfs/oecd/ee6587fd-en.pdf
Example: how to convert a Word file to markdown with the OpenAI engine:
python ./pdf_to_markdown_llm/main/cli.py convert-files -f "./docs/Explainability March 2025.docx"
Example: how to convert a Word file to html with the OpenAI engine:
python ./pdf_to_markdown_llm/main/cli.py convert-files -f "./docs/bk/Pour INSCRIPTION en ligne MARCORIGNAN .docx" -t html
Example: how to convert a single file with Gemini model:
python ./pdf_to_markdown_llm/main/cli.py convert-files -f ./pdfs/oecd/002b3a39-en.pdf -e gemini
Example: how to convert all pdf files in a folder:
python ./pdf_to_markdown_llm/main/cli.py convert-in-dir --dirs ./pdfs/oecd
FAQs
This project contains a command line tool to convert PDF to markdown. It uses image conversion and a LLM to convert the images to markdown.
We found that pdf-to-markdown-llm demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
The MCP community is launching an official registry to standardize AI tool discovery and let agents dynamically find and install MCP servers.
Research
Security News
Socket uncovers an npm Trojan stealing crypto wallets and BullX credentials via obfuscated code and Telegram exfiltration.
Research
Security News
Malicious npm packages posing as developer tools target macOS Cursor IDE users, stealing credentials and modifying files to gain persistent backdoor access.