
Security News
Attackers Are Hunting High-Impact Node.js Maintainers in a Coordinated Social Engineering Campaign
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.
dagnostics
Advanced tools
DAGnostics is an intelligent ETL monitoring system that leverages LLMs to analyze, categorize, and report DAG failures in data pipelines. It provides automated parsing of DAG errors and is designed to generate comprehensive statistics for better observability.
Planned / Future Enhancements:
pip install uv)cd dagnostics
# Basic installation (web dashboard only)
uv sync
# With LLM providers for full analysis
uv sync --extra llm
# With web dashboard (minimal)
uv sync --extra web
# With development dependencies
uv sync --extra dev
# With fine-tuning dependencies (heavy ML libraries)
uv sync --extra finetuning
# With all optional dependencies
uv sync --extra all
# Basic installation (web dashboard only)
pip install dagnostics
# With LLM providers for full analysis
pip install dagnostics[llm]
# With web dashboard (minimal)
pip install dagnostics[web]
# With development dependencies
pip install dagnostics[dev]
# With fine-tuning dependencies (heavy ML libraries)
pip install dagnostics[finetuning]
# With all optional dependencies
pip install dagnostics[all]
pip install dagnostics[web] - Minimal dependencies, fast installationpip install dagnostics[llm] - Includes LLM providers for error analysispip install dagnostics[finetuning] - ML libraries for local model fine-tuningpip install dagnostics[dev] - Testing and linting toolsChoose your installation method above, then continue with setup:
Set up pre-commit hooks (if using uv for development):
uv run pre-commit install
ollama pull mistral
cp config/config.yaml.example config/config.yaml
# Edit config.yaml with your Airflow credentials and LLM provider settings
# Analyze a specific task failure (replace with actual values)
uv run dagnostics analyze my_dag my_task 2025-08-13T10:00:00 1 --llm ollama
# Start background monitoring daemon
uv run dagnostics daemon start
# Check daemon status
uv run dagnostics daemon status
dagnostics/
├── data/
│ ├── clusters/ # Drain3 cluster persistence & baselines
│ ├── raw/ # Raw log files
│ ├── processed/ # Processed analysis results
│ └── training_data.jsonl # Generated training datasets
├── src/dagnostics/
│ ├── api/ # FastAPI REST API
│ ├── cli/ # Command-line interface
│ ├── core/ # Models, config, database
│ ├── daemon/ # Background monitoring service
│ ├── llm/ # LLM providers & configurable prompts
│ ├── clustering/ # Drain3 log clustering & baselines
│ ├── heuristics/ # Pattern filtering engines
│ ├── monitoring/ # Airflow integration & collectors
│ ├── reporting/ # Report generation (stub)
│ ├── web/ # Web dashboard UI
│ └── utils/ # Helpers, logging, SMS
├── config/
│ ├── config.yaml # Main configuration
│ ├── drain3.ini # Drain3 clustering settings
│ ├── filter_patterns.yaml # Heuristic filtering patterns
│ └── logging.yaml # Logging configuration
├── tests/ # Test suites
├── scripts/ # Development & deployment scripts
└── docs/ # Documentation
DAGnostics is highly configurable through config/config.yaml. Key configuration areas include:
DAGnostics now supports configurable prompts with few-shot learning:
prompts:
# Few-shot examples for better error extraction
few_shot_examples:
error_extraction:
- log_context: |
[2025-08-13 10:15:25] ERROR: psycopg2.OperationalError: FATAL: database "analytics_db" does not exist
extracted_response: |
{
"error_message": "psycopg2.OperationalError: FATAL: database \"analytics_db\" does not exist",
"confidence": 0.95,
"category": "configuration_error",
"severity": "high",
"reasoning": "Database connection error due to missing database"
}
# Custom prompt templates (override defaults)
templates:
error_extraction: |
You are an expert ETL engineer analyzing Airflow task failure logs...
DAGnostics provides comprehensive fine-tuning capabilities to improve error extraction accuracy using your production data.
# 1. Check training environment status
dagnostics training status
# 2. Prepare datasets from human-reviewed data
dagnostics training prepare-data data/your_training_dataset.json
# 3. Choose your training method:
# Option A: Local fine-tuning (requires GPU/training deps)
dagnostics training train-local --epochs 3 --batch-size 2
# Option B: OpenAI API fine-tuning
export OPENAI_API_KEY="your-key-here"
dagnostics training train-openai --model gpt-3.5-turbo
# Option C: Remote training server
dagnostics training remote-train --server-url http://training-server:8001
# 4. Evaluate your model
dagnostics training evaluate <model_path> --test-dataset data/fine_tuning/validation_dataset.jsonl
# 5. Deploy to Ollama for local inference
dagnostics training deploy-ollama <model_path> --model-name my-error-extractor
DAGnostics fine-tuning works best with:
Requirements:
pip install dagnostics[finetuning]Features:
# Install training dependencies
pip install dagnostics[finetuning]
# Fine-tune with custom settings
dagnostics training train-local \
--model-name "microsoft/DialoGPT-small" \
--epochs 5 \
--learning-rate 2e-4 \
--batch-size 4 \
--model-output-name "my-error-extractor" \
--use-quantization true
Requirements:
Features:
# Set API key
export OPENAI_API_KEY="your-key-here"
# Start fine-tuning
dagnostics training train-openai \
--model "gpt-3.5-turbo" \
--suffix "my-error-extractor" \
--wait true
Requirements:
Features:
# Submit training job
dagnostics training remote-train \
--model-name "microsoft/DialoGPT-small" \
--epochs 3 \
--server-url "http://gpu-server:8001" \
--wait true
# Check job status
dagnostics training remote-status <job_id>
# Download completed model
dagnostics training remote-download <job_id>
Evaluate your fine-tuned models with comprehensive metrics:
# Evaluate local model
dagnostics training evaluate models/my-model \
--test-dataset data/fine_tuning/validation_dataset.jsonl \
--model-type local
# Evaluate OpenAI fine-tuned model
dagnostics training evaluate "ft:gpt-3.5-turbo:my-org:model:abc123" \
--model-type openai
# View detailed evaluation report
cat evaluations/evaluation_20250817_143022.md
Evaluation Metrics:
# Export and deploy fine-tuned model
dagnostics training deploy-ollama models/my-model \
--model-name "dagnostics-error-extractor" \
--auto-build true
# Test deployed model
ollama run dagnostics-error-extractor "Analyze this error log..."
# Update DAGnostics config to use fine-tuned model
# config/config.yaml:
llm:
default_provider: "ollama"
providers:
ollama:
base_url: "http://localhost:11434"
model: "dagnostics-error-extractor"
# Update config to use fine-tuned OpenAI model
# config/config.yaml:
llm:
default_provider: "openai"
providers:
openai:
api_key: "${OPENAI_API_KEY}"
model: "ft:gpt-3.5-turbo:my-org:model:abc123"
dagnostics training export-feedback --min-rating 3| Command | Description |
|---|---|
training status | Show training environment and dataset status |
training prepare-data | Convert human-reviewed data to training format |
training train-local | Fine-tune local model with LoRA/QLoRA |
training train-openai | Fine-tune using OpenAI API |
training train-anthropic | Prepare data for Anthropic (when available) |
training evaluate | Evaluate model accuracy on test data |
training deploy-ollama | Export model for Ollama deployment |
training remote-train | Submit job to remote training server |
training remote-status | Check remote training job status |
training feedback-stats | Show human feedback statistics |
training export-feedback | Export feedback for training |
{few_shot_examples}
Now analyze this log:
{log_context}
### LLM Provider Configuration
```yaml
llm:
default_provider: "ollama" # ollama, openai, anthropic, gemini
providers:
ollama:
base_url: "http://localhost:11434"
model: "mistral"
temperature: 0.1
gemini:
api_key: "YOUR_API_KEY"
model: "gemini-2.5-flash"
temperature: 0.0
Edit config/config.yaml to add domain-specific examples:
prompts:
few_shot_examples:
error_extraction:
- log_context: |
[2025-08-13 15:30:25] ERROR: Your custom error pattern here
[2025-08-13 15:30:25] ERROR: Additional context
extracted_response: |
{
"error_message": "Extracted error message",
"confidence": 0.90,
"category": "configuration_error",
"severity": "high",
"reasoning": "Why this is the root cause"
}
Override any default prompt by adding to config.yaml:
prompts:
templates:
sms_error_extraction: |
Custom SMS prompt template here.
Extract concise error for: {dag_id}.{task_id}
Log: {log_context}
DAGnostics uses an intelligent baseline approach for error detection:
The system includes curated examples covering common Airflow error patterns:
These examples help LLMs provide more accurate error categorization and confidence scores.
DAGnostics provides a CLI for managing the monitoring and reporting system. Use the following commands:
# Launch the interactive web dashboard
uv run dagnostics web
# Custom host and port
uv run dagnostics web --host 0.0.0.0 --port 8080
# Enable auto-reload for development
uv run dagnostics web --reload --log-level debug
The web dashboard provides:
uv run dagnostics start
Note: The monitoring daemon is not yet implemented. This command is a placeholder.
uv run dagnostics analyze <dag-id> <task-id> <run-id> <try-number>
--llm/-l: LLM provider (ollama, openai, anthropic, gemini)--format/-f: Output format (json, yaml, text)--verbose/-v: Verbose output--baseline: Use baseline comparison for anomaly detection# Start the monitoring daemon
uv run dagnostics daemon start
# Stop the daemon
uv run dagnostics daemon stop
# Check daemon status
uv run dagnostics daemon status
# Create baseline for a specific DAG task
uv run dagnostics baseline create <dag-id> <task-id>
# List existing baselines
uv run dagnostics baseline list
# Refresh stale baselines
uv run dagnostics baseline refresh
uv run dagnostics report
uv run dagnostics report --daily
Note: Report generation and export are not yet implemented. These commands are placeholders.
# LLM Engine Usage
from dagnostics.llm.engine import LLMEngine, OllamaProvider
from dagnostics.core.config import load_config
from dagnostics.core.models import LogEntry
# Load configuration with custom prompts
config = load_config()
# Initialize LLM engine with config
provider = OllamaProvider()
engine = LLMEngine(provider, config=config)
# Analyze log entries (few-shot learning applied automatically)
log_entries = [LogEntry(...)]
analysis = engine.extract_error_message(log_entries)
print(f"Error: {analysis.error_message}")
print(f"Category: {analysis.category}")
print(f"Confidence: {analysis.confidence}")
# Baseline Management
from dagnostics.clustering.log_clusterer import LogClusterer
clusterer = LogClusterer(config)
baseline_clusters = clusterer.build_baseline_clusters(successful_logs, dag_id, task_id)
anomalous_logs = clusterer.identify_anomalous_patterns(failed_logs, dag_id, task_id)
DAGnostics v0.5.0 includes a comprehensive REST API with real-time WebSocket capabilities:
# Start the API server
uv run dagnostics web --host 0.0.0.0 --port 8000
# API Documentation available at:
# http://localhost:8000/docs (Swagger UI)
# http://localhost:8000/redoc (ReDoc)
Key API Routes:
/api/v1/analysis/analyze - Analyze task failures/api/v1/dashboard/stats - Get dashboard statistics/api/v1/monitor/status - Monitor service status/api/v1/training/candidates - Manage training datasets// Connect to WebSocket for live updates
const ws = new WebSocket('ws://localhost:8000/ws');
ws.onmessage = function(event) {
const update = JSON.parse(event.data);
switch(update.type) {
case 'analysis_complete':
console.log('Analysis completed:', update.data);
break;
case 'new_failure':
console.log('New failure detected:', update.data);
break;
case 'status_change':
console.log('Status changed:', update.data);
break;
}
};
# Get training candidates
curl http://localhost:8000/api/v1/training/candidates
# Submit human feedback
curl -X POST http://localhost:8000/api/v1/training/candidates/{id}/feedback \
-H "Content-Type: application/json" \
-d '{"action": "approve", "reviewer_name": "analyst"}'
# Export dataset
curl -X POST http://localhost:8000/api/v1/training/export \
-H "Content-Type: application/json" \
-d '{"format": "jsonl", "include_rejected": false}'
The tasks/ folder contains utility scripts for common development tasks, such as setting up the environment, linting, formatting, and running tests. These tasks are powered by Invoke.
Run the following commands from the root of the project:
| Command | Description |
|---|---|
invoke dev.setup | Set up the development environment. |
invoke dev.clean | Clean build artifacts and temporary files. |
invoke dev.format | Format the code using black and isort. |
invoke dev.lint | Lint the code using flake8 and mypy. |
invoke dev.test | Run all tests with pytest. |
# Run all tests
uv run pytest
# Run with coverage
uv run pytest --cov=dagnostics
# Run specific test file
uv run pytest tests/llm/test_parser.py
git checkout -b feature/amazing-feature
./scripts/test.sh
./scripts/lint.sh
git commit -m "Add amazing feature"
A modern web dashboard UI is included in src/dagnostics/web/. It provides:
Note: The backend API endpoints for the dashboard may be incomplete or stubbed. Some dashboard features may not display real data yet.
See CONTRIBUTING.md for how to help!
See CONTRIBUTING.md for detailed guidelines.
This project is licensed under the MIT License - see the LICENSE file for details.
For questions and support, please open an issue in the GitHub repository.
FAQs
Unknown package
We found that dagnostics demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.

Security News
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.

Security News
Node.js has paused its bug bounty program after funding ended, removing payouts for vulnerability reports but keeping its security process unchanged.