Socket
Book a DemoInstallSign in
Socket

data-analysis-framework

Package Overview
Dependencies
Maintainers
1
Versions
3
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

data-analysis-framework

AI-powered analysis framework for structured data files and databases - part of the unified analysis framework suite

pipPyPI
Version
2.0.0
Maintainers
1

Data Analysis Framework

Version 2.0.0 - Part of the unified analysis framework suite

📈 Purpose

Specialized framework for analyzing structured data (spreadsheets, databases, configuration files) with AI-powered pattern detection and safe agent query capabilities.

Important: This framework focuses on structured data access via natural language queries, not document chunking. For document processing, see the complementary frameworks below.

📦 Supported Formats

Spreadsheets & Tables

  • Excel: XLSX, XLS with multiple sheets
  • CSV/TSV: Delimiter detection and parsing
  • Apache Parquet: Columnar data analysis
  • JSON: Nested and flat structure analysis
  • JSONL: Line-delimited JSON streams

Configuration Data

  • YAML: Configuration files and data serialization
  • TOML: Configuration file analysis
  • INI: Legacy configuration parsing
  • Environment Files: .env variable analysis

Database Exports

  • SQL Dumps: Schema and data analysis
  • SQLite: Database file inspection
  • Database Connection: Live data analysis

🤖 AI Integration Features

  • Schema Detection: Automatic column type inference
  • Pattern Analysis: Anomaly and trend detection
  • Data Quality Assessment: Missing values, duplicates, outliers
  • Relationship Discovery: Cross-table dependencies
  • Business Logic Extraction: Rules and constraints
  • Predictive Insights: Forecasting and recommendations

🚀 Quick Start

from data_analysis_framework import DataAnalyzer

analyzer = DataAnalyzer()
result = analyzer.analyze("sales_data.xlsx")

print(f"Data Type: {result.document_type.type_name}")
print(f"Schema: {result.analysis.schema_info}")
print(f"Quality Score: {result.analysis.quality_metrics['overall_score']}")
print(f"AI Insights: {result.analysis.ai_insights}")

🔄 Unified Interface Support

This framework now supports the unified interface standard, providing consistent access patterns across all analysis frameworks:

import data_analysis_framework as daf

# Use the unified interface
result = daf.analyze_unified("sales_data.csv")

# All access patterns work consistently
doc_type = result['document_type']        # Dict access ✓
doc_type = result.document_type           # Attribute access ✓
doc_type = result.get('document_type')    # get() method ✓
as_dict = result.to_dict()                # Full dict conversion ✓

# Works the same across all frameworks
print(f"Framework: {result.framework}")   # 'data-analysis-framework'
print(f"Type: {result.document_type}")    # 'CSV Data'
print(f"Confidence: {result.confidence}")  # Quality-based confidence
print(f"AI opportunities: {result.ai_opportunities}")

The unified interface ensures compatibility when switching between frameworks or using multiple frameworks together.

🏗️ Status

🚧 Active Development - Core functionality implemented, v2.0.0 adopts unified framework interfaces

🌐 Framework Suite

This framework is part of a unified suite of analysis frameworks, each optimized for different data types:

Document Processing Frameworks (Chunking-Based)

These frameworks chunk documents for RAG/LLM consumption:

  • xml-analysis-framework - XML document analysis with 29+ specialized handlers (SCAP, Maven, Spring, etc.)
  • docling-analysis-framework - Office documents, PDFs, and images using IBM Docling
  • document-analysis-framework - General document processing and analysis

Data Access Framework (Query-Based)

This framework provides safe AI agent access to structured data:

Shared Foundation

Key Differences

Framework TypeUse CaseAI IntegrationOutput
Document Frameworks"Chunk this manual for search"RAG, semantic searchText chunks for embeddings
Data Framework"Show customers with revenue > $10M"Natural language queriesQuery results and insights

When to Use What

  • Processing documents? Use xml/docling/document frameworks to chunk content for vector search
  • Querying databases/spreadsheets? Use data-analysis-framework for safe AI agent access
  • Both? Combine them! Document frameworks for knowledge + data framework for operational queries

See CHUNKING_DECISION.md for detailed explanation of this framework's query-based approach.

📝 What's New in v2.0.0

  • ✅ Adopted analysis-framework-base for unified interfaces
  • ✅ Inherits from BaseAnalyzer for consistent API across frameworks
  • ✅ Implements UnifiedAnalysisResult for standard result format
  • ✅ Added get_supported_formats() method for format discovery
  • ✅ 100% backward compatible - all existing code works unchanged
  • ℹ️ Does not implement BaseChunker - uses query-based paradigm instead (see CHUNKING_DECISION.md)

Keywords

data-analysis

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts