You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP →

Book a Demo Install Sign in

llm-extractinator

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

llm-extractinator

A framework that enables efficient extraction of structured data from unstructured text using large language models (LLMs).

0.5.4

PyPI

Maintainers: 1

LLM Extractinator

Overview of the LLM Data Extractor

⚠️ This tool is a prototype in active development and may change significantly. Always verify results!

LLM Extractinator enables efficient extraction of structured data from unstructured text using large language models (LLMs). It supports configurable task definitions, CLI or Python usage, a point‑and‑click GUI Studio, and flexible data input/output formats.

📘 Full documentation: https://DIAGNijmegen.github.io/llm_extractinator/

🔧 Installation

1. Install Ollama

On Linux

curl -fsSL https://ollama.com/install.sh | sh

On Windows or macOS

Download the installer from: https://ollama.com/download

2. Install the Package

Create a fresh conda environment:

conda create -n llm_extractinator python=3.11
conda activate llm_extractinator

Install the package via pip:

pip install llm_extractinator

Or from source:

git clone https://github.com/DIAGNijmegen/llm_extractinator.git
cd llm_extractinator
pip install -e .

Tip: to be able to run the latest models, update the Ollama client regularly:
pip install --upgrade ollama langchain-ollama

🖥️ Interactive Studio GUI (beta)

Starting with v 0.4, Extractinator ships with a Streamlit‑based Studio for designing, running and monitoring extraction tasks with zero code:

Studio screenshot

launch-extractinator  # opens http://localhost:8501 in your browser

Features


🗂️ Project Manager	Create / select datasets, parsers and tasks with file previews
🔧 Parser Builder	Visual Pydantic schema designer (nested models supported)
🚀 One‑click Runs	Configure model, sampling & advanced flags, then watch live logs
🛠️ Task JSON Wizard	Step‑by‑step helper to generate valid `TaskXXX.json` files
🆘 Help bubbles everywhere	Inline docs so you never lose context

The Studio is fully optional: anything you configure here can still be executed from the CLI or Python API.

🚀 Quick Usage

GUI

launch-extractinator  # recommended for new users

CLI

extractinate --task_id 001 --model_name "phi4"

Python

from llm_extractinator import extractinate

extractinate(task_id=1, model_name="phi4")

📁 Task Files

Each task is defined by a JSON file stored in tasks/.

Filename format:

TaskXXX_name.json

Example:

{
  "Description": "Extract product data from text.",
  "Data_Path": "products.csv",
  "Input_Field": "text",
  "Parser_Format": "product_parser.py"
}

Parser_Format points to a .py file in tasks/parsers/ that implements a Pydantic OutputParser model used to structure the LLM output.

🛠️ Visual Schema Builder (optional)

If you prefer a graphical approach to designing parsers, run:

build-parser

This starts the same builder embedded in the Studio, letting you assemble nested Pydantic models visually. Save the resulting .py file in tasks/parsers/ and reference it via Parser_Format.

👉 Read the parser docs for full details.

📄 Citation

If you use this tool, please cite: https://doi.org/10.5281/zenodo.15089764

🤝 Contributing

We welcome pull requests! See the contributing guide for details.

FAQs

What is llm-extractinator?

Is llm-extractinator well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

llm-extractinator

LLM Extractinator

🔧 Installation

1. Install Ollama

On Linux

On Windows or macOS

2. Install the Package

🖥️ Interactive Studio GUI (beta)

🚀 Quick Usage

GUI

CLI

Python

📁 Task Files

🛠️ Visual Schema Builder (optional)

📄 Citation

🤝 Contributing

Related posts

New CNA Scorecard Tool Ranks CVE Data Quality Across the Ecosystem

Malicious npm Packages Target WhatsApp Developers with Remote Kill Switch