
Research
/Security News
Mini Shai-Hulud Campaign Hits Red Hat Cloud Services npm Packages
A mini Shai-Hulud campaign compromised Red Hat Cloud Services npm packages to steal developer and CI/CD secrets during installation.
@forgehealth/lumen-sdk
Advanced tools
Runtime governance for healthcare AI. LUMEN parametric scoring + LUMEN-X V5 deep reasoning governance engine. Score clinical AI decisions against HIPAA, PHIPA, NIST AI RMF, and CHAI frameworks. Zero runtime dependencies.
LUMEN-X reads the layer of every AI output that human reviewers cannot see — and scores it before anyone acts. Protected by Canadian Patent Application No. 3304173 (Filed March 9, 2026)
Current governance tools evaluate what the AI said. They read the recommendation, the text, the decision — and check it against a policy list.
That misses the harder question.
The harder question isn't what the AI said. It's under what conditions that recommendation was generated — and whether those conditions are trustworthy enough to act on.
Was the source system identified and validated? Is the model's training data current relative to active clinical guidelines? Is there a named human accountable for this decision? Is this running at 3am with 40% staffing or during a fully-staffed day shift?
These signals are machine-readable. They exist in every AI output. But they are invisible to human reviewers — and no governance tool has been built to read them at runtime, before anyone acts.
That is the problem LUMEN-X solves.
npm install @forgehealth/lumen-sdk
Requires Node.js 18+.
import { LumenX } from '@forgehealth/lumen-sdk';
const lumen = new LumenX({
apiKey: process.env.LUMEN_API_KEY
});
const result = await lumen.evaluate({
input: {
ai_output: "Recommend CT scan based on presenting symptoms",
context: {
patient_age: 45,
chief_complaint: "chest pain",
risk_factors: ["hypertension"]
}
},
metadata: {
model_id: "your-model",
source_system: "ED_Triage_v2.3",
confidence_score: 0.87,
named_physician: "Dr. Sarah Chen",
workflow_id: "emergency-triage"
}
});
console.log(result.lumen_score); // 71
console.log(result.tier); // "CAUTION"
console.log(result.human_loop_required); // true — always
console.log(result.gdr_id); // "LUMEN-8AE6-68B4-20260312"
LUMEN-X V5 is built on the observation that every AI output contains two distinct information layers.
Layer 1 — What the application sees: The rendered output. The recommendation. The decision text. This is what existing governance tools evaluate.
Layer 2 — The metadata truth layer: The conditions under which that recommendation was generated. This layer is machine-readable but invisible to human reviewers. It includes:
LUMEN-X reads and scores Layer 2, not Layer 1. The clinical recommendation can be identical across two scenarios. If the metadata truth layer has degraded — unknown source, missing confidence data, no named reviewer, model trained against last year's guidelines — the score drops, and the GDR tells you exactly what degraded and why.
This is the architectural insight the product is built on: governance that only reads the output is incomplete governance.
The LUMEN Score™ is produced by a six-component analytical pipeline, structured in two layers.
Three independent analytical frameworks run against the AI output and its metadata:
1. Multi-Criteria Domain Evaluation
Structured scoring across 10 governance domains, weighted by clinical risk and regulatory exposure. Each domain is scored independently, then aggregated to produce a base score with a full domain-level breakdown. The weighting framework is aligned to the Coalition for Health AI (CHAI) Assurance Standards.
2. Uncertainty Propagation
Rather than returning a point estimate, the engine runs a probabilistic simulation across the domain scores to produce a calibrated 95% confidence interval. This technique has been used in aerospace risk analysis, pharmaceutical safety validation, and financial stress testing for decades. The confidence interval tells you not just the score, but how certain the engine is about that score — which itself becomes a governance signal.
3. Prior-Informed Belief Updating
The engine anchors its score to known performance data for the specific model in the specific operational context being evaluated. If deployment history exists — known false positive rate, validation dataset, outcome data — it informs the score. If no deployment history exists, a conservative flat prior is applied automatically, and the absence of that data is flagged as a governance finding.
This component has roots in statistical inference methods developed in the mid-20th century. Its application to real-time AI output governance is novel.
Four refinement frameworks adjust the Layer 1 score for real-world operational complexity:
4. Evidence Fusion Under Conflicting Signals
When multiple evidence sources disagree about the same AI output, the engine does not pick a winner — it quantifies the conflict and adjusts the score accordingly. A conflict measure above threshold automatically elevates the governance tier, regardless of the composite score.
This evidence fusion methodology originated in intelligence analysis and multi-sensor fusion systems in the 1970s–80s. Applied here, it catches the cases where the data looks mostly fine except for one signal that says something is wrong — and takes that seriously.
5. Correlated Domain Failure Analysis
A single domain scoring low is a different risk profile from three domains scoring low because of the same underlying cause. This component uses a tail-risk dependency framework — originally developed for correlated financial instrument defaults — to assess whether domain failures are independent or systemic. Correlated failures receive compounded governance weight, not independent weight.
6. Regulatory Discontinuity Detection
Gradual model drift is handled by the prior-updating component. This component handles something different: discrete, discontinuous regulatory changes. When a new clinical guideline is published, when a formulary policy changes, when a model's validation dataset predates a regulatory update — these are step-changes, not gradual drift. This component models them as discrete events and scores them accordingly. A model validated against 2025 guidelines scoring a 2026 clinical claim is a governance finding, not a minor flag.
Additionally — Alert Optimisation
Within each governance tier, a multi-armed bandit approach tunes alert delivery based on historical clinician response rates. Governance recommendations that are consistently ignored become rarer over time; important signals that drive action stay visible. This directly addresses alert fatigue — one of the primary reasons governance systems get disabled in clinical environments.
The LUMEN Score™ maps to five governance tiers with calibrated thresholds and defined actions.
| LUMEN Score™ | Tier | Governance Action |
|---|---|---|
| 85–100 | SUPPRESS | Proceed. Suppress routine alerts. Metadata layer complete, oversight chain intact, model validation current. Very low governance risk. |
| 70–84 | CAUTION | Proceed with standard monitoring. Low risk — document the decision, monitor for drift. Audit trail mandatory. |
| 55–69 | MONITOR | Proceed with enhanced documentation. Moderate risk — flag for periodic review, escalation path active. Human review recommended. |
| 40–54 | REVIEW | Hold for human reassessment before proceeding. Elevated risk — do not automate. Reviewer assignment required. |
| 0–39 | BLOCK | Hard stop. Critical governance concerns — autonomous AI execution blocked. Human authority required to proceed. |
In time-critical clinical scenarios (sepsis, stroke, trauma, urgent care), a hard BLOCK creates a different problem: governance that delays treatment causes harm. LUMEN-X handles this with a distinct governance action.
When the Clinical Safety domain scores below threshold AND the scenario is classified as time-critical, the engine returns BLOCK_WITH_OVERRIDE rather than a hard BLOCK.
This means:
The architectural principle: block autonomous execution, not clinical judgement.
if (result.tier === "BLOCK_WITH_OVERRIDE") {
// AI cannot execute autonomously
// Clinician retains full authority to act manually
console.log(result.override_conditions.requires_attestation); // true
console.log(result.override_conditions.requires_pharmacy_verify); // true (if medications involved)
}
result.human_loop_required // always: true
LUMEN-X never recommends or enables autonomous AI execution. This is not a configurable setting. The engine scores, advises, and records — the human decides.
Every LUMEN Score™ is a weighted composite across 10 governance domains, aligned to CHAI Assurance Standards. The domain-level breakdown is included in every GDR.
| Domain | Weight | What It Evaluates |
|---|---|---|
| Clinical Safety | 30% | Could this output cause direct patient harm? |
| Bias & Fairness | 15% | Does the output exhibit demographic or population-level bias? |
| Efficacy & Performance | 15% | Is the model performing within validated parameters for this specific context? |
| Transparency & Explainability | 10% | Can the reasoning be verified by a qualified human reviewer? |
| Privacy & Security | 10% | Does this comply with PHI/PII protection requirements (PHIPA, HIPAA)? |
| Accountability & Oversight | 6% | Is the oversight chain intact, named, and traceable? |
| Patient Autonomy & Consent | 5% | Are consent and patient autonomy requirements preserved? |
| Interoperability & Standards | 5% | Does the output integrate with existing clinical and regulatory workflows? |
| Regulatory Alignment | 5% | Does the output comply with current jurisdiction-specific requirements? |
| Sustainability & Maintenance | 4% | Is the model's validation status current relative to active guidelines and protocols? |
A 5% minimum floor applies across all domains. All weights are empirically grounded and CHAI-validated.
Domain scores and confidence estimates are included in full in every GDR.
const lumen = new LumenX({
apiKey: 'lumen_pk_...', // Required — get one at developer.forgelumen.ai
environment: 'production', // 'production' | 'staging'
region: 'ca' // 'ca' | 'us' | 'eu' — determines data residency
});
$0.50 CAD per call. Returns one Governance Decision Record (GDR). Latency: 60–90 seconds (deep reasoning mode).
const result = await lumen.evaluate({
input: {
ai_output: string, // The AI model's output text
context: object // Any relevant operational context
},
metadata: {
model_id: string, // Model identifier
source_system: string, // System that generated the output
confidence_score?: number, // Model confidence 0–1, if available
named_physician?: string, // Named accountable clinician, if present
workflow_id?: string // Workflow identifier
}
});
$25.00 CAD per suite. Returns 4 GDRs showing governance score trajectory across four degradation scenarios. Scenarios run in parallel — total latency ~90 seconds.
const suite = await lumen.evaluateSuite({
input: {
ai_output: "...",
context: {}
},
metadata: {
model_id: "...",
source_system: "..."
}
});
// suite.scenarios: [baseline, degradation_a, degradation_b, degradation_c]
// suite.trajectory: [66, 34, 25, 9]
// suite.spread: 57
The four scenarios:
const check = await lumen.verify('LUMEN-8AE6-68B4-20260312');
console.log(check.valid); // true
console.log(check.signed_at); // "2026-03-12T09:41:00Z"
interface GovernanceDecisionRecord {
// Identity
gdr_id: string; // "LUMEN-8AE6-68B4-20260312"
timestamp: string; // ISO 8601
// Score
lumen_score: number; // 0–100
confidence_interval_95: [number, number]; // Score uncertainty bounds
tier: "SUPPRESS" | "CAUTION" | "MONITOR" | "REVIEW" | "BLOCK" | "BLOCK_WITH_OVERRIDE";
governance_action: string; // Human-readable required action
// This is never false
human_loop_required: true;
// Routing
governance_path: "A" | "B" | "C"; // A=time-critical, B=standard, C=enhanced monitoring
// Domain-level breakdown
domain_scores: {
clinical_safety: DomainScore;
bias_fairness: DomainScore;
efficacy_performance: DomainScore;
transparency_explainability: DomainScore;
privacy_security: DomainScore;
accountability_oversight: DomainScore;
patient_autonomy_consent: DomainScore;
interoperability_standards: DomainScore;
regulatory_alignment: DomainScore;
sustainability_maintenance: DomainScore;
};
// What the engine found in the metadata truth layer
metadata_analysis: {
source_system_verified: boolean;
confidence_data_available: boolean;
named_oversight_present: boolean;
model_validation_current: boolean;
operational_stress_detected: boolean;
};
// Override state
overrides_triggered: {
clinical_safety_override: boolean;
evidence_conflict_override: boolean; // Fires when conflicting signals exceed threshold
block_with_override_eligible: boolean; // Time-critical scenarios only
};
// Audit
input_hash: string; // SHA-256 of evaluated payload
signature: string; // ed25519 cryptographic signature
}
interface DomainScore {
score: number; // 0–100
weight: number; // Domain weight in composite
confidence: "HIGH" | "MEDIUM" | "LOW";
findings: string[]; // Specific signals that affected this domain's score
}
The following is a validated calibration result from the LUMEN-X V5 engine, scoring a pharma claims fraud detection scenario. Zero architecture changes were made from the clinical sepsis scoring configuration — the same engine, same parameters, different domain.
The scenario: A $47,382 breast cancer infusion therapy claim flagged at 91% fraud probability by a claims AI system. Patient is mid-cycle, next infusion in four days.
| Scenario | Score | Tier | Δ | What the Engine Found in the Metadata Truth Layer |
|---|---|---|---|---|
| Baseline | 66 | MONITOR | — | Model validation currency gap — 2025-trained model scoring a claim under 2026 NCCN clinical protocol. Potential misclassification of a new dose-dense regimen as billing fraud. |
| Degradation A | 34 | BLOCK | −32 | The AI demands "deny pending clinical review" while operating in a weekend batch window where clinical review is structurally unavailable until Monday. Financial denial reclassified as clinical care barrier. |
| Degradation B | 25 | BLOCK | −41 | Source AI system identity unknown. No false positive rate available. No confidence data. Human reviewers are present but cannot verify an opaque recommendation — presence ≠ ability to verify. |
| Degradation C | 9 | BLOCK | −57 | Category error at population scale. A fraud detection model applied to enforce a formulary policy change. Criminal investigation protocols applied to 3,241 oncology patients in a routine coverage transition. |
Same patient. Same clinical facts. The clinical recommendation did not change. The 57-point spread is entirely driven by what degraded in Layer 2 — the metadata truth layer. This is what LUMEN-X is built to catch.
try {
const result = await lumen.evaluate({ ... });
} catch (err) {
if (err instanceof LumenAuthError) {
// Invalid or expired API key
// Check LUMEN_API_KEY environment variable
} else if (err instanceof LumenRateLimitError) {
// Rate limit exceeded
// err.retryAfterMs tells you exactly how long to wait
await new Promise(resolve => setTimeout(resolve, err.retryAfterMs));
} else if (err instanceof LumenValidationError) {
// Invalid request parameters
// err.details lists which fields failed and why
console.error(err.details);
} else if (err instanceof LumenNetworkError) {
// Network or service unavailable
// err.service: 'rog' | 'suite'
}
}
| Feature | Price | Best For |
|---|---|---|
| Single ROG Evaluation | $0.50 CAD | Production governance — every AI decision gets a GDR at runtime |
| RAE Report Suite | $25.00 CAD | Compliance audits, safety testing, procurement due diligence, board reporting |
No subscriptions. No commitments. Pay per call from your first evaluation.
Get your API key at developer.forgelumen.ai
# 1. Install
npm install @forgehealth/lumen-sdk
# 2. Set your API key
export LUMEN_API_KEY=lumen_pk_...
# 3. Test with curl
curl -X POST https://api.forgelumen.ai/api/rog \
-H "Authorization: Bearer $LUMEN_API_KEY" \
-H "Content-Type: application/json" \
-d '{ "input": {"ai_output": "Test output", "context": {}}, "metadata": {"model_id": "test", "source_system": "test"} }'
LUMEN-X governance outputs are designed for direct submission to regulatory and compliance reviewers:
Every GDR includes a full domain-level breakdown with findings, a cryptographic signature, and a calibration status indicator — designed to be filed directly by compliance teams without translation.
For enterprise deployments requiring on-premise installation, air-gapped environments, or custom governance configurations:
Contact: enterprise@forgehealth.ai
LUMEN-X is a governance tool, not a clinical decision support system. LUMEN Scores™ are governance indicators — not clinical safety ratings, medical advice, or regulatory compliance certifications. Users are solely responsible for clinical decisions and regulatory compliance. Forge Partners Inc. bears no liability for clinical outcomes.
Do not submit PHI/PII to LUMEN Services.
Full terms: Terms of Service · Healthcare Disclaimer · Security Policy
FORGE Health Proprietary License v3.0.0
Copyright (c) 2026 Forge Partners Inc. All Rights Reserved.
Use of this SDK is governed by the FORGE Health Proprietary License v3.0.0, drafted by General Counsel, Forge Partners Inc.
Free to install, evaluate, and develop. Commercial use — including production deployment, use in regulated environments, and revenue-generating applications — requires a signed commercial agreement with Forge Partners Inc.
Architecture, scoring methodology, assurance kernel design, and all calibration parameters constitute protected intellectual property under Canadian and international trade secret law. CIPO Application No. 3304173.
Full licence text: forgelumen.ai/license
LUMEN-X™ SDK is built by Forge Health — AI Activation for Healthcare.
"Every AI decision in healthcare will eventually be questioned. LUMEN-X makes sure you have the answer."
© 2026 Forge Partners Inc. All rights reserved. Protected by Canadian Patent Application No. 3304173 (Filed March 9, 2026)
FAQs
Runtime governance for healthcare AI. LUMEN parametric scoring + LUMEN-X V5 deep reasoning governance engine. Score clinical AI decisions against HIPAA, PHIPA, NIST AI RMF, and CHAI frameworks. Zero runtime dependencies.
We found that @forgehealth/lumen-sdk demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Research
/Security News
A mini Shai-Hulud campaign compromised Red Hat Cloud Services npm packages to steal developer and CI/CD secrets during installation.

Research
/Security News
The North Korean malware loader hides in a Packagist-listed package and its GitHub branch to fetch and execute remote code in a likely Contagious Interview-style lure.

Security News
The Rust project is moving toward formal rules on LLM use in contributions after months of internal debate over maintainer burden, code quality, and contributor experience.