You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP
Socket
Book a DemoInstallSign in
Socket

docowling

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

docowling

SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for powering downstream workflows such as gen AI applications.

1.0.17
pipPyPI
Maintainers
1

Docowling

Docowling

Docs PyPI version PyPI - Python Version Poetry Code style: black Imports: isort Pydantic v2

Docowling is a fork of the Docling, an IBM project, developed to enhance functionalities and add new document processing capabilities.

Why Docowling?

Like an owl watching for all prey, docowling is a fork intended to attack all types of documents.

Docowling

Features

  • 📄 Converts popular formats (CSV, PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) to HTML, Markdown and JSON with embedded/referenced images
  • 🧩 Unified DoclingDocument format for standardized representation
  • 🤖 Ready-to-use integrations with LangChain, LlamaIndex, Crew AI & Haystack
  • 💻 Intuitive CLI for efficient batch processing with customizable export parameters

Coming Soon

  • 📄 More formats compatibility
  • 🤖 Optimize integrations with LangChain, Crew AI & Weaviate

Installation

To use Docowling, simply install docowling from your package manager, e.g. pip or uv:

pip install docowling
uv pip install docowling

Works on macOS, Linux and Windows environments. Both x86_64 and arm64 architectures.

Getting started

To convert individual documents, use convert(), for example:

from docowling.document_converter import DocumentConverter

source = "https://arxiv.org/pdf/2408.09869"  # document per local path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())  # output: "## Docowling Technical Report[...]"
from docowling.document_converter import DocumentConverter

source = "/content/drive/MyDrive/TESLA.csv"  # document per local path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())  
# output: "| Date     |      Open |      High [...]"

License

The Docowling codebase is under MIT license. For individual model usage, please refer to the model licenses found in the original packages.

IBM ❤️ Thanks

Thank you IBM for creating Docling, the base of Docowling.

Keywords

docowling

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts