Specialized framework for analyzing structured data (spreadsheets, databases, configuration files) with AI-powered pattern detection and safe agent query capabilities.

Important: This framework focuses on structured data access via natural language queries, not document chunking. For document processing, see the complementary frameworks below.

📦 Supported Formats

Spreadsheets & Tables

Excel: XLSX, XLS with multiple sheets
CSV/TSV: Delimiter detection and parsing
Apache Parquet: Columnar data analysis
JSON: Nested and flat structure analysis
JSONL: Line-delimited JSON streams

Configuration Data

YAML: Configuration files and data serialization
TOML: Configuration file analysis
INI: Legacy configuration parsing
Environment Files: .env variable analysis

Database Exports

SQL Dumps: Schema and data analysis
SQLite: Database file inspection
Database Connection: Live data analysis

🤖 AI Integration Features

Schema Detection: Automatic column type inference
Pattern Analysis: Anomaly and trend detection
Data Quality Assessment: Missing values, duplicates, outliers
Relationship Discovery: Cross-table dependencies
Business Logic Extraction: Rules and constraints
Predictive Insights: Forecasting and recommendations

🚀 Quick Start

from data_analysis_framework import DataAnalyzer

analyzer = DataAnalyzer()
result = analyzer.analyze("sales_data.xlsx")

print(f"Data Type: {result.document_type.type_name}")
print(f"Schema: {result.analysis.schema_info}")
print(f"Quality Score: {result.analysis.quality_metrics['overall_score']}")
print(f"AI Insights: {result.analysis.ai_insights}")

🔄 Unified Interface Support

This framework now supports the unified interface standard, providing consistent access patterns across all analysis frameworks:

import data_analysis_framework as daf

# Use the unified interface
result = daf.analyze_unified("sales_data.csv")

# All access patterns work consistently
doc_type = result['document_type']        # Dict access ✓
doc_type = result.document_type           # Attribute access ✓
doc_type = result.get('document_type')    # get() method ✓
as_dict = result.to_dict()                # Full dict conversion ✓

# Works the same across all frameworks
print(f"Framework: {result.framework}")   # 'data-analysis-framework'
print(f"Type: {result.document_type}")    # 'CSV Data'
print(f"Confidence: {result.confidence}")  # Quality-based confidence
print(f"AI opportunities: {result.ai_opportunities}")

The unified interface ensures compatibility when switching between frameworks or using multiple frameworks together.

🏗️ Status

🚧 Active Development - Core functionality implemented, v2.0.0 adopts unified framework interfaces

🌐 Framework Suite

This framework is part of a unified suite of analysis frameworks, each optimized for different data types:

Document Processing Frameworks (Chunking-Based)

These frameworks chunk documents for RAG/LLM consumption:

xml-analysis-framework - XML document analysis with 29+ specialized handlers (SCAP, Maven, Spring, etc.)
docling-analysis-framework - Office documents, PDFs, and images using IBM Docling
document-analysis-framework - General document processing and analysis

Data Access Framework (Query-Based)

This framework provides safe AI agent access to structured data:

data-analysis-framework (this framework) - Structured data via natural language queries

Shared Foundation

analysis-framework-base - Common interfaces and models for all frameworks

Key Differences

Framework Type	Use Case	AI Integration	Output
Document Frameworks	"Chunk this manual for search"	RAG, semantic search	Text chunks for embeddings
Data Framework	"Show customers with revenue > $10M"	Natural language queries	Query results and insights

When to Use What

Processing documents? Use xml/docling/document frameworks to chunk content for vector search
Querying databases/spreadsheets? Use data-analysis-framework for safe AI agent access
Both? Combine them! Document frameworks for knowledge + data framework for operational queries

See CHUNKING_DECISION.md for detailed explanation of this framework's query-based approach.

📝 What's New in v2.0.0

✅ Adopted analysis-framework-base for unified interfaces
✅ Inherits from BaseAnalyzer for consistent API across frameworks
✅ Implements UnifiedAnalysisResult for standard result format
✅ Added get_supported_formats() method for format discovery
✅ 100% backward compatible - all existing code works unchanged
ℹ️ Does not implement BaseChunker - uses query-based paradigm instead (see CHUNKING_DECISION.md)

Keywords

FAQs

What is data-analysis-framework?

Is data-analysis-framework well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

data-analysis-framework