Data Analysis Framework
Version 2.0.0 - Part of the unified analysis framework suite
📈 Purpose
Specialized framework for analyzing structured data (spreadsheets, databases, configuration files) with AI-powered pattern detection and safe agent query capabilities.
Important: This framework focuses on structured data access via natural language queries, not document chunking. For document processing, see the complementary frameworks below.
📦 Supported Formats
Spreadsheets & Tables
- Excel: XLSX, XLS with multiple sheets
- CSV/TSV: Delimiter detection and parsing
- Apache Parquet: Columnar data analysis
- JSON: Nested and flat structure analysis
- JSONL: Line-delimited JSON streams
Configuration Data
- YAML: Configuration files and data serialization
- TOML: Configuration file analysis
- INI: Legacy configuration parsing
- Environment Files: .env variable analysis
Database Exports
- SQL Dumps: Schema and data analysis
- SQLite: Database file inspection
- Database Connection: Live data analysis
🤖 AI Integration Features
- Schema Detection: Automatic column type inference
- Pattern Analysis: Anomaly and trend detection
- Data Quality Assessment: Missing values, duplicates, outliers
- Relationship Discovery: Cross-table dependencies
- Business Logic Extraction: Rules and constraints
- Predictive Insights: Forecasting and recommendations
🚀 Quick Start
from data_analysis_framework import DataAnalyzer
analyzer = DataAnalyzer()
result = analyzer.analyze("sales_data.xlsx")
print(f"Data Type: {result.document_type.type_name}")
print(f"Schema: {result.analysis.schema_info}")
print(f"Quality Score: {result.analysis.quality_metrics['overall_score']}")
print(f"AI Insights: {result.analysis.ai_insights}")
🔄 Unified Interface Support
This framework now supports the unified interface standard, providing consistent access patterns across all analysis frameworks:
import data_analysis_framework as daf
result = daf.analyze_unified("sales_data.csv")
doc_type = result['document_type']
doc_type = result.document_type
doc_type = result.get('document_type')
as_dict = result.to_dict()
print(f"Framework: {result.framework}")
print(f"Type: {result.document_type}")
print(f"Confidence: {result.confidence}")
print(f"AI opportunities: {result.ai_opportunities}")
The unified interface ensures compatibility when switching between frameworks or using multiple frameworks together.
🏗️ Status
🚧 Active Development - Core functionality implemented, v2.0.0 adopts unified framework interfaces
🌐 Framework Suite
This framework is part of a unified suite of analysis frameworks, each optimized for different data types:
Document Processing Frameworks (Chunking-Based)
These frameworks chunk documents for RAG/LLM consumption:
Data Access Framework (Query-Based)
This framework provides safe AI agent access to structured data:
Shared Foundation
Key Differences
| Document Frameworks | "Chunk this manual for search" | RAG, semantic search | Text chunks for embeddings |
| Data Framework | "Show customers with revenue > $10M" | Natural language queries | Query results and insights |
When to Use What
- Processing documents? Use xml/docling/document frameworks to chunk content for vector search
- Querying databases/spreadsheets? Use data-analysis-framework for safe AI agent access
- Both? Combine them! Document frameworks for knowledge + data framework for operational queries
See CHUNKING_DECISION.md for detailed explanation of this framework's query-based approach.
📝 What's New in v2.0.0
- ✅ Adopted
analysis-framework-base for unified interfaces
- ✅ Inherits from
BaseAnalyzer for consistent API across frameworks
- ✅ Implements
UnifiedAnalysisResult for standard result format
- ✅ Added
get_supported_formats() method for format discovery
- ✅ 100% backward compatible - all existing code works unchanged
- ℹ️ Does not implement
BaseChunker - uses query-based paradigm instead (see CHUNKING_DECISION.md)