

Structured generation (in Rust).
Outlines-core
This package provides the core functionality for structured generation, formerly implemented in Outlines,
with a focus on performance and portability, it offers a convenient way to:
-
build regular expressions from JSON schemas
-
construct an Index object by combining a Vocabulary and regular expression to efficiently map tokens from a given vocabulary to state transitions in a finite-state automation
Example
Basic example of how it all fits together.
use outlines_core::prelude::*;
let schema = r#"{
"type": "object",
"properties": {
"name": { "type": "string" },
"age": { "type": "integer" }
},
"required": ["name", "age"]
}"#;
let regex = json_schema::regex_from_str(&schema, None)?;
let vocabulary = Vocabulary::from_pretrained("openai-community/gpt2", None)?;
let index = Index::new(®ex, &vocabulary)?;
let initial_state = index.initial_state();
let allowed_tokens = index.allowed_tokens(&initial_state).expect("Some allowed token ids");
let token_id = allowed_tokens.first().expect("First token id");
let next_state = index.next_state(&initial_state, token_id);
let final_states = index.final_states();
Vocabulary
You can create a Vocabulary in three ways:
-
Vocabulary::from_pretrained(model, parameters) - Loads from a pretrained model (as in the example above)
-
Manual creation - You can create a vocabulary from token mappings:
-
Vocabulary::new(eos_token_id) - Creates an empty vocabulary, then add tokens with try_insert():
let mut vocabulary = Vocabulary::new(50256);
vocabulary.try_insert("hello", 0)?;
vocabulary.try_insert(vec![32], 1)?;
-
Vocabulary::try_from((eos_token_id, tokens)) - Creates a vocabulary by directly providing the token mappings.
-
It can be done either with the tokens as strings:
use rustc_hash::FxHashMap as HashMap;
let eos_token_id: u32 = 50256;
let mut tokens: HashMap<String, Vec<u32>> = HashMap::default();
tokens.insert("hello".to_string(), vec![0]);
tokens.insert("world".to_string(), vec![1]);
let vocabulary = Vocabulary::try_from((eos_token_id, tokens))?;
-
Or with the tokens as byte vector keys:
use rustc_hash::FxHashMap as HashMap;
let eos_token_id: u32 = 50256;
let mut tokens: HashMap<Vec<u8>, Vec<u32>> = HashMap::default();
tokens.insert(b"hello".to_vec(), vec![0]);
tokens.insert(b"world".to_vec(), vec![1]);
let vocabulary = Vocabulary::try_from((eos_token_id, tokens))?;
Important: When creating a Vocabulary manually from tokenizer data, ensure tokens are converted to their string representations to replace special tokens that wouldn't be recognized by the DFA.
Python Bindings
Additionally, project provides interfaces to integrate the crate's functionality with Python.
import json
from outlines_core.json_schema import build_regex_from_schema
from outlines_core.guide import Guide, Index, Vocabulary
schema = {
"title": "Foo",
"type": "object",
"properties": {"date": {"type": "string", "format": "date"}}
}
regex = build_regex_from_schema(json.dumps(schema))
vocabulary = Vocabulary.from_pretrained("openai-community/gpt2")
index = Index(regex, vocabulary)
guide = Guide(index)
current_state = guide.get_state()
allowed_tokens = guide.get_tokens()
next_allowed_tokens = guide.advance(allowed_tokens[-1])
guide.is_finished()
assert guide.get_tokens() == [vocabulary.get_eos_token_id()]
How to contribute?
Setup
Fork the repository on GitHub and clone the fork locally:
git clone git@github.com/YourUserName/outlines-core.git
cd outlines-core
Create a new virtual environment and install the dependencies in editable mode:
python -m venv .venv
source .venv/bin/activate
pip install -e ".[test]"
pre-commit install
Before pushing your code
If working with Python bindings don't forget to build Rust extension before testing, for example, in debug mode:
make build-extension-debug
Run Python tests:
pytest
Run Rust tests:
cargo test
Or alternatively using Makefile for both:
make test
Finally, run the code style checks:
pre-commit run --all-files
Or using Makefile:
make pcc
If necessary you can run benchmarks locally:
make pybench
Join us
- 💡 Have an idea? Come chat with us on Discord
- Found a bug? Open an issue