
Research
/Security News
Mini Shai-Hulud Campaign Hits Red Hat Cloud Services npm Packages
A mini Shai-Hulud campaign compromised Red Hat Cloud Services npm packages to steal developer and CI/CD secrets during installation.
cruxial
Advanced tools
The reliability layer for LLM tool calls. Intercepts every call, validates arguments against your schema, auto-repairs hallucinated args before execution.
The reliability layer for LLM tool calls.
Your agent said it sent the email. It didn't.
Cruxial intercepts every LLM tool call before it executes, validates arguments against your schema, and auto-repairs hallucinated args via a structured retry. Drop-in for OpenAI and Anthropic. <1ms p99 added overhead — validation is local, no extra network hop. Fail-open by default — if Cruxial itself errors, the tool still executes.
pip install cruxial
Then watch it catch every failure category live — offline, no API key:
cruxial demo

from cruxial import guard
from openai import OpenAI
client = OpenAI()
# Standard OpenAI tool definitions
schemas = {
"send_email": {
"type": "object",
"properties": {
"to": {"type": "string", "format": "email"},
"subject": {"type": "string", "maxLength": 200},
"body": {"type": "string"},
},
"required": ["to", "subject", "body"],
}
}
# Your actual executors
def send_email(to, subject, body):
return mailer.send(to=to, subject=subject, body=body)
executors = {"send_email": send_email}
# Wrap once
cruxial = guard(schemas=schemas, executors=executors)
# In your agent loop:
for tool_call in llm_response.tool_calls:
result = cruxial.execute(tool_call.name, tool_call.arguments)
if not result.ok:
# result.failure.category, .message, .repair_prompt
# See "Auto-repair" below for the one-line fix
raise result.failure.as_exception()
use(result.value)
guard() gives you full control. If you write the usual raw-SDK loop, cruxial.run()
does the entire tool step in one call — call the model, validate every tool call,
execute the valid ones, auto-repair the bad ones in one round-trip, log, and append
the results — for OpenAI / Azure / Anthropic / LiteLLM. You keep your loop:
import cruxial
result = cruxial.run(
client, # your OpenAI / AzureOpenAI / Anthropic client, or litellm.completion
model="gpt-4o",
messages=messages,
tools=tools, # the same tool defs you already pass the LLM
executors=executors, # {tool_name: your_function}
)
while not result.finished: # your loop stays yours — one model call per turn
result = cruxial.run(client, model="gpt-4o", messages=result.messages,
tools=tools, executors=executors)
print(result.text) # the model's final answer
It reuses your already-configured client (Azure endpoint/version, base_url, timeouts —
all preserved), derives schemas from tools, and fails open. It is deliberately one
turn, not a framework: no streaming, no multi-turn ownership — you decide when to stop.
Need to own execution yourself? Drop to guard().check() / .execute().
Eight failure categories. Every interception is logged with the failure category — never the raw argument values.
| Category | What it catches |
|---|---|
missing_required | Required field not in args |
type_mismatch | Wrong type (int instead of str, etc.) |
enum_violation | Value not in allowed enum |
format_violation | Bad email / uri / date format |
constraint_violation | maxLength / minimum / pattern / etc. |
extra_field | Model invented a field that doesn't exist |
unknown_tool | Tool name not in registry |
tool_bypass | Model claimed it did something but emitted no call — "your agent said it sent the email. It didn't." |
The first seven are schema-derivable. tool_bypass is the one validators
structurally can't catch — there's no call to validate. See below.
A model says "I've sent the email" and emits no send_email call. No
error, no log, HTTP 200 — the silent failure. cruxial.run() catches it:
result = cruxial.run(client, model="gpt-4o", messages=messages,
tools=tools, executors=executors) # bypass check is on by default
if result.bypass: # the model claimed an action and, when re-prompted, confirmed it
print("caught a bypass:", result.bypass.tool)
How it works (zero cost on normal turns): a final text turn is flagged
only when it claims a completed action ("sent", not "send"), attributed
to the assistant (not "you" / "the scheduler" / "automatically"), for a
side-effecting tool that was never called. A flagged turn gets one neutral
re-prompt — the model either re-emits the call (corrected + executed) or
declines (we do nothing). We only ever act on a model-confirmed re-emission, so
we never fabricate an action.
Benchmarked on a 132-scenario adversarial set (see BENCHMARKS.md):
100% correction recall, acted-on precision 100% (Claude sonnet-4-6) / 98.4%
(gpt-4o). Set bypass="off" to disable; bypass="strict" for a 2-call variant.
from cruxial.adapters.openai import auto_repair
cruxial = guard(schemas=schemas, executors=executors)
result = cruxial.execute(name, args)
if not result.ok:
# 1-attempt structured retry with the failure injected into context
new_args = auto_repair(client, model, messages, schemas, result.failure)
result = cruxial.execute(name, new_args)
Roughly 90% of intercepted calls are fixed in a single repair round-trip on the pooled live-MCP benchmark (66–92% across models and schema complexity). Every number is sourced in BENCHMARKS.md.
Every interception is written to a local SQLite file (no data leaves your machine). To see your real rate:
cruxial stats
Stats are project-local automatically. When you run cruxial stats from
inside a project (any directory with .git/, pyproject.toml, setup.py,
or .cruxial/), the database lives at ./.cruxial/telemetry.sqlite —
keeping each app's stats separate. Outside a project it falls back to
~/.cruxial/telemetry.sqlite. Override either with the
CRUXIAL_DB_PATH=/some/path env var (respected by both the SDK and the
CLI). Run cruxial diagnostic to see which path is in effect.
Output:
cruxial · last 24h
─────────────────────────────────────────
total calls 1,247
intercepted 184 (14.8%)
auto-repaired 167 (90.8% of intercepted)
passed through 1,063
top failing tools rate
send_email 23.1%
create_calendar_event 18.4%
search_web 9.2%
top failure categories
type_mismatch 62
missing_required 44
enum_violation 38
format_violation 24
constraint_violation 16
sequenceDiagram
participant M as LLM model
participant C as Cruxial guard
participant T as Your tool / executor
participant DB as Local SQLite telemetry
M->>C: tool call (name, args)
C->>C: validate args vs JSON Schema
alt args valid
C->>DB: record PASSED (hashes only)
C->>T: run tool
T-->>M: result
else args invalid
C->>DB: record INTERCEPTED + failure category
opt auto-repair enabled
C-->>M: repair prompt (1-shot retry)
M->>C: corrected tool call
end
C-->>M: typed failure (caller decides what to do)
end
Cruxial wraps the tool registry, not the LLM client. No monkey-patching, no proxies, no framework lock-in.
If your code maintains TWO views of each tool schema — the canonical full one (used internally for execution) and a trimmed view sent to the LLM (the L1 / model-visible schema) — register the trimmed one with Cruxial, not the canonical.
Why: the LLM can only satisfy the schema it was shown. If the canonical
has fields the LLM never saw, missing_required and extra_field
interceptions become false positives — the model didn't fail, you just
validated against the wrong contract.
# Right
cruxial = guard(schemas=tool_definitions_sent_to_llm)
# If you must register the canonical schema, opt in explicitly:
cruxial = guard(
schemas=canonical_definitions,
config=GuardConfig(schema_origin="canonical"), # warns + tags every row
)
schema_origin="canonical" does NOT change validation logic — it just
emits a warning at construction, tags every telemetry row, and surfaces
a notice in cruxial stats so you can later filter out the false
positives if you decide to.
cruxial stats shows the registry independently of traffic:
registry 6 registered · 3 fired · 1 intercepted
Useful for the "is it even on?" moment after first install. If you see
traffic but 0 interceptions, that's usually a well-behaved model on a
simple schema — cruxial.testing.violation_payloads(schema) lets you
fire a synthetic violation per category to verify end-to-end.
By default Cruxial stores: tool name, schema fingerprint, failure category, timing. Never the argument values themselves. Hashes only.
The interceptor runs in your process. Your data never leaves your infrastructure unless you opt into Cruxial Cloud (coming soon).
cruxial stats CLIcruxial.run() — one managed turn (OpenAI / Azure / Anthropic / LiteLLM)tool_bypass detection — the claimed-but-never-called catchComing:
MIT. The SDK runs entirely in your process. The hosted dashboard (Cruxial Cloud) will be a separate paid product. The interceptor itself stays MIT forever.
FAQs
The reliability layer for LLM tool calls. Intercepts every call, validates arguments against your schema, auto-repairs hallucinated args before execution.
We found that cruxial demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Research
/Security News
A mini Shai-Hulud campaign compromised Red Hat Cloud Services npm packages to steal developer and CI/CD secrets during installation.

Research
/Security News
The North Korean malware loader hides in a Packagist-listed package and its GitHub branch to fetch and execute remote code in a likely Contagious Interview-style lure.

Security News
The Rust project is moving toward formal rules on LLM use in contributions after months of internal debate over maintainer burden, code quality, and contributor experience.