datascav-switch

datascav-switch is a Python package for intelligent document format conversion, leveraging generative AI (OpenAI) and a scalable architecture. This project is part of a suite of tools for automation, data extraction, and transformation.
Main Features
- PDF to Markdown conversion with layout preservation
- Support for multiple input formats (file, URL, base64, bytes)
- Parallel processing and dynamic logging
- Detailed token tracking
- Native integration with LangChain and tracing via LangSmith
Installation
pip install datascav-switch
Requirements:
- Python 3.10+
- OpenAI API key (
OPENAI_API_KEY
)
Quick Start
from scav_switch.converters.pdf import ScavToMarkdown
scav = ScavToMarkdown(model='gpt-4.1', verbose=True)
markdown = scav.dig('/path/to/file.pdf')
print(markdown)
For complete examples and detailed documentation, see the docs/
folder and the notebooks for each module.
Documentation
License
MIT