🚨 Shai-Hulud Strikes Again:834 Packages Compromised.Technical Analysis →

Book a Demo Install Sign in

layoutscribe

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

layoutscribe

LLM-only, agentic layout & text extraction to Markdown/Text/Layout JSON

PyPI

Version: 0.1.0a3

Maintainers: 1

LayoutScribe

LLM-powered layout & text extraction for PDFs, slides, and Word docs

LLM-only, agentic parser that converts PDF / PPTX / DOCX into clean Markdown, plain text, and layout JSON (with normalized bounding boxes).
Built with LangGraph (agent orchestration), LiteLLM (provider-agnostic multimodal calls), and MLflow (tracing).

No OCR engines, no heuristic parsers. Rendering to images is allowed; all structure and text understanding is done by a multimodal LLM.

Features (0.1)

Inputs: PDF, PPTX, DOCX (rendered pages/slides as images)
Outputs:
- Markdown (headings, lists, tables, captions)
- Plain text
- Layout JSON (blocks with type, bbox[0..1], text, conf)
Agentic pipeline: planner → page_vision (async) → reviewer (validate/re-ask) → composer
Robustness:
- Re-ask on schema/geometry violations (IoU/coverage checks)
- Fallback injection when LLM returns empty content so Markdown is never blank
Provider-agnostic via LiteLLM (OpenAI, Azure OpenAI, Claude, Gemini)
MLflow tracing for params, metrics, artifacts

Status

0.1 (alpha) released — see CHANGELOG.md and docs/ROADMAP.md.

Quick Links

docs/ARCHITECTURE.md – modules & flow
docs/PROMPTS_AND_SCHEMA.md – prompt rules and schema notes
docs/schema/layout_page.schema.json – formal JSON Schema (Draft 2020-12)
docs/CONFIGURATION.md – env vars, provider-specific setup, .env example
docs/API_SPEC.md / docs/CLI_SPEC.md – contracts & examples
docs/BENCHMARKS.md – datasets & metrics
docs/TESTING_STRATEGY.md – testing plan & commands
docs/PROVIDERS.md – model matrix & concurrency guidance
docs/SECURITY.md – keys, artifacts, and vulnerability reporting
docs/ROADMAP.md – milestones
CONTRIBUTING.md – how to help
CHANGELOG.md – notable changes

Installation

Requires Python 3.10+.

pip install layoutscribe

Optional extras:

# Office file support (PPTX/DOCX rendering via python-pptx / python-docx)
pip install "layoutscribe[office]"

# Development tools (ruff, black, pytest)
pip install "layoutscribe[dev]"

Runtime notes:

PDF rendering: PyMuPDF (included)
PPTX/DOCX support: python-pptx, python-docx (install with [office])

Getting Started

Set provider keys as environment variables (see CONFIGURATION.md). Example .env:

OPENAI_API_KEY=sk-...
LAYOUTSCRIBE_DPI=180

Quickstart

CLI

layoutscribe parse ./samples/report.pdf \
  --llm openai/gpt-4o \
  --outputs markdown text layout_json \
  --output-dir ./artifacts/report \
  --dpi 180 --parallel-pages 6 --budget-usd 0.50

Python API

import asyncio
from layoutscribe.api import parse as ls_parse


async def main() -> None:
  doc = await ls_parse(
    path="samples/report.pdf",
    outputs=["markdown", "text", "layout_json"],
    llm="openai/gpt-4o",
    dpi=180,
    parallel_pages=6,
    budget_usd=0.50,
    save_intermediate=True,
  )
  print(doc.metadata)
  print(doc.markdown[:1000])


if __name__ == "__main__":
  asyncio.run(main())

Outputs & Artifacts

./artifacts/report/
  document.md
  document.txt
  layout.json
  overlays/
    page-0001.png
    page-0002.png
  intermediate/
    page-0001.json

Configuration

See docs/CONFIGURATION.md for provider-specific env vars, defaults, and precedence. MLflow tracing is opt-in via --trace-mlflow.

LiteLLM provider setup

LiteLLM reads provider keys from environment variables. Set only those you need:

# OpenAI
OPENAI_API_KEY=sk-...

# Azure OpenAI
AZURE_OPENAI_API_KEY=...  
AZURE_OPENAI_ENDPOINT=https://<your-resource>.openai.azure.com/  
AZURE_OPENAI_API_VERSION=2024-02-15-preview

# Anthropic
ANTHROPIC_API_KEY=...

# Google (Gemini)
GOOGLE_API_KEY=...

Use --llm to pick a model via LiteLLM:

--llm openai/gpt-4o
--llm azure/<deployment_name>
--llm anthropic/claude-3.5-sonnet
--llm google/gemini-1.5-pro

Notes:

For Azure, ensure the deployment name references a vision-capable model and that your endpoint/API version are set.
Keep temperature low (0–0.2) for consistent JSON.
Respect provider rate limits; we use retries with exponential backoff.

Limitations (0.1)

No OCR engines; relies entirely on a multimodal LLM
Basic tables only (CSV-like); no complex rowspan/colspan recovery
No handwriting support; language translation out of scope
Confidence scores (if present) are heuristic and not calibrated

Community & Support

Open issues and discussions on GitHub
For security concerns, follow SECURITY.md (use private advisories)

License

Apache-2.0 (see LICENSE).

Keywords

FAQs

What is layoutscribe?

Is layoutscribe well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

layoutscribe

LayoutScribe

Features (0.1)

Status

Quick Links

Installation

Getting Started

Quickstart

CLI

Python API

Outputs & Artifacts

Configuration

LiteLLM provider setup

Limitations (0.1)

Community & Support

License

Keywords

Related posts

New React Server Components Vulnerabilities: DoS and Source Code Exposure

Software Engineering Daily Podcast: Feross on AI, Open Source, and Supply Chain Risk