
Security News
Axios Supply Chain Attack Reaches OpenAI macOS Signing Pipeline, Forces Certificate Rotation
OpenAI rotated macOS signing certificates after a malicious Axios package reached its CI pipeline in a broader software supply chain attack.
openchatbi
Advanced tools
OpenChatBI - Natural language business intelligence powered by LLMs for intuitive data analysis and SQL generation
OpenChatBI is an open source, chat-based intelligent BI tool powered by large language models, designed to help users query, analyze, and visualize data through natural language conversations. Built on LangGraph and LangChain ecosystem, it provides chat agents and workflows that support natural language to SQL conversion and streamlined data analysis.
Join the Slack channel to discuss: https://join.slack.com/t/openchatbicommunity/shared_invite/zt-3jpzpx9mv-Sk88RxpO4Up0L~YTZYf4GQ
docker executor modeNote on Chinese Text Segmentation: For better Chinese text retrieval, jieba is used for word segmentation. However, jieba is not compatible with Python 3.12+. On Python 3.12 and higher, the system automatically falls back to simple punctuation-based segmentation for Chinese text.
git clone git@github.com:zhongyu09/openchatbi
uv sync
pip install openchatbi
git clone git@github.com:zhongyu09/openchatbi
uv sync --group dev
Optional: If you want to use pysqlite3 (newer SQLite builds), you can install it manually. If build fails, install SQLite first:
On macOS, try to install sqlite using Homebrew:
brew install sqlite
brew info sqlite
export LDFLAGS="-L/opt/homebrew/opt/sqlite/lib"
export CPPFLAGS="-I/opt/homebrew/opt/sqlite/include"
On Amazon Linux / RHEL / CentOS:
sudo yum install sqlite-devel
On Ubuntu / Debian:
sudo apt-get update
sudo apt-get install libsqlite3-dev
Run demo using example dataset from spider dataset. You need to provide "YOUR OPENAI API KEY" or change config to use other LLM providers.
Note: The demo example includes embedding model configuration. If you want to run without an embedding model, you can remove the embedding_model section in the config - BM25 retrieval will be used automatically.
cp example/config.yaml openchatbi/config.yaml
sed -i 's/YOUR_API_KEY_HERE/[YOUR OPENAI API KEY]/g' openchatbi/config.yaml
python run_streamlit_ui.py
Copy the configuration template:
cp openchatbi/config.yaml.template openchatbi/config.yaml
Or create an empty YAML file.
# Select which provider to use
default_llm: openai
# Define one or more providers
llm_providers:
openai:
default_llm:
class: langchain_openai.ChatOpenAI
params:
api_key: YOUR_API_KEY_HERE
model: gpt-4.1
temperature: 0.02
max_tokens: 8192
# Optional: Embedding model for vector-based retrieval and memory tools
# If not configured, BM25-based retrieval will be used, and the memory tools will not work
embedding_model:
class: langchain_openai.OpenAIEmbeddings
params:
api_key: YOUR_API_KEY_HERE
model: text-embedding-3-large
chunk_size: 1024
organization: Your Company
dialect: presto
data_warehouse_config:
uri: "presto://user@host:8080/catalog/schema"
include_tables:
- your_table_name
database_name: "catalog.schema"
export CONFIG_FILE=YOUR_CONFIG_FILE_PATH
from openchatbi import get_default_graph
graph = get_default_graph()
graph.invoke({"messages": [{"role": "user", "content": "Show me ctr trends for the past 7 days"}]},
config={"configurable": {"thread_id": "1"}})
# System-generated SQL
SELECT date, SUM(clicks)/SUM(impression) AS ctr
FROM ad_performance
WHERE date >= CURRENT_DATE - 7 DAYS
GROUP BY date
ORDER BY date;
Streamlit based UI:
streamlit run sample_ui streamlit_ui.py
Run Gradio based UI:
python sample_ui/streaming_ui.py
The configuration template is provided at config.yaml.template. Key configuration sections include:
organization: Organization name (e.g., "Your Company")dialect: Database dialect (e.g., "presto")bi_config_file: Path to BI configuration file (e.g., "example/bi.yaml")catalog_store: Configuration for data catalog storage
store_type: Storage type (e.g., "file_system")data_path: Path to catalog data stored by file system (e.g., "./example")data_warehouse_config: Database connection settings
uri: Connection string for your databaseinclude_tables: List of tables to include in catalog, leave empty to include all tablesdatabase_name: Database name for catalogtoken_service: Token service URL (for data warehouse that need token authentication like Presto)user_name / password: Token service credentialsVarious LLMs are supported based on LangChain, see LangChain API
Document(https://python.langchain.com/api_reference/reference.html#integrations) for full list that support
chat_models. You can configure different LLMs for different tasks:
default_llm: Primary language model for general tasksembedding_model: (Optional) Model for embedding generation. If not configured, BM25-based text retrieval will be used as fallback, and the memory tools will not worktext2sql_llm: (Optional) Specialized model for SQL generation. If not configured, uses default_llmMultiple providers (optional):
llm_providers and select with default_llm: <provider_name>.sample_ui/streamlit_ui.py, a provider dropdown appears when llm_providers is configured.sample_api/async_api.py, pass provider in the /chat/stream request body.Commonly used LLM providers and their corresponding classes and installation commands:
langchain_anthropic.ChatAnthropic, pip install langchain-anthropiclangchain_openai.ChatOpenAI, pip install langchain-openailangchain_openai.AzureChatOpenAI, pip install langchain-openailangchain_google_vertexai.ChatVertexAI, pip install langchain-google-vertexailangchain_aws.ChatBedrock, pip install langchain-awslangchain_huggingface.ChatHuggingFace, pip install langchain-huggingfacelangchain_deepseek.ChatDeepSeek, pip install langchain-deepseeklangchain_ollama.ChatOllama, pip install langchain-ollamaOpenChatBI supports sophisticated customization through prompt engineering and catalog management features:
For detailed configuration options and examples, see the Advanced Features section.
OpenChatBI is built using a modular architecture with clear separation of concerns:
openchatbi/
├── README.md # Project documentation
├── pyproject.toml # Modern Python project configuration
├── Dockerfile.python-executor # Docker image for isolated code execution
├── run_tests.py # Test runner script
├── run_streamlit_ui.py # Streamlit UI launcher
├── openchatbi/ # Core application code
│ ├── __init__.py # Package initialization
│ ├── config.yaml.template # Configuration template
│ ├── config_loader.py # Configuration management
│ ├── constants.py # Application constants
│ ├── agent_graph.py # Main LangGraph workflow
│ ├── graph_state.py # State definition for workflows
│ ├── context_config.py # Context management configuration
│ ├── context_manager.py # Context window and token management
│ ├── text_segmenter.py # Text segmentation with jieba support
│ ├── utils.py # Utility functions and SimpleStore (BM25-based retrieval)
│ ├── catalog/ # Data catalog management
│ │ ├── __init__.py # Package initialization
│ │ ├── catalog_loader.py # Catalog loading logic
│ │ ├── catalog_store.py # Catalog storage interface
│ │ ├── factory.py # Catalog factory patterns
│ │ ├── helper.py # Catalog helper functions
│ │ ├── retrival_helper.py # Retrieval helper utilities
│ │ ├── schema_retrival.py # Schema retrieval logic
│ │ ├── token_service.py # Token service integration
│ │ └── store/ # Catalog storage implementations
│ │ └── file_system.py # File system-based catalog storage
│ ├── code/ # Code execution framework
│ │ ├── __init__.py # Package initialization
│ │ ├── executor_base.py # Base executor interface
│ │ ├── local_executor.py # Local Python execution
│ │ ├── restricted_local_executor.py # RestrictedPython execution
│ │ └── docker_executor.py # Docker-based isolated execution
│ ├── llm/ # LLM integration layer
│ │ ├── __init__.py # Package initialization
│ │ └── llm.py # LLM management and retry logic
│ ├── prompts/ # Prompt templates and engineering
│ │ ├── __init__.py # Package initialization
│ │ ├── agent_prompt.md # Main agent prompts
│ │ ├── extraction_prompt.md # Information extraction prompts
│ │ ├── system_prompt.py # System prompt management
│ │ ├── summary_prompt.md # Summary conversation prompts
│ │ ├── table_selection_prompt.md # Table selection prompts
│ │ ├── text2sql_prompt.md # Text-to-SQL prompts
│ │ └── sql_dialect/ # SQL dialect-specific prompts
│ ├── text2sql/ # Text-to-SQL conversion pipeline
│ │ ├── __init__.py # Package initialization
│ │ ├── data.py # Data and retriever for Text-to-SQL
│ │ ├── extraction.py # Information extraction
│ │ ├── generate_sql.py # SQL generation and execution logic
│ │ ├── schema_linking.py # Schema linking process
│ │ ├── sql_graph.py # SQL generation LangGraph workflow
│ │ ├── text2sql_utils.py # Text2SQL utilities
│ │ └── visualization.py # Data visualization functions
│ └── tool/ # LangGraph tools and functions
│ ├── ask_human.py # Human-in-the-loop interactions
│ ├── memory.py # Memory management tool
│ ├── mcp_tools.py # MCP (Model Context Protocol) integration
│ ├── run_python_code.py # Configurable Python code execution
│ ├── save_report.py # Report saving functionality
│ ├── search_knowledge.py # Knowledge base search
│ └── timeseries_forecast.py # Time series forecasting tool
├── sample_api/ # API implementations
│ └── async_api.py # Asynchronous FastAPI example
├── sample_ui/ # Web interface implementations
│ ├── memory_ui.py # Memory-enhanced UI interface
│ ├── plotly_utils.py # Plotly utilities and helpers
│ ├── simple_ui.py # Simple non-streaming Gradio UI
│ ├── streaming_ui.py # Streaming Gradio UI with real-time updates
│ ├── streamlit_ui.py # Streaming Streamlit UI with enhanced features
│ └── style.py # UI styling and CSS
├── example/ # Example configurations and data
│ ├── bi.yaml # BI configuration example
│ ├── config.yaml # Application config example
│ ├── table_info.yaml # Table information
│ ├── table_columns.csv # Table column registry
│ ├── common_columns.csv # Common column definitions
│ ├── sql_example.yaml # SQL examples for retrieval
│ ├── table_selection_example.csv # Table selection examples
│ └── tracking_orders.sqlite # Sample SQLite database
├── timeseries_forecasting/ # Time series forecasting service
│ ├── README.md # Forecasting service documentation
│ └── ... # Forecasting service implementation
├── tests/ # Test suite
│ ├── __init__.py # Package initialization
│ ├── conftest.py # Test configuration
│ ├── test_*.py # Test modules for various components
│ └── README.md # Testing documentation
├── docs/ # Documentation
│ ├── source/ # Sphinx documentation source
│ ├── build/ # Built documentation
│ ├── Makefile # Documentation build scripts
│ └── make.bat # Windows build script
└── .github/ # GitHub workflows and templates
└── workflows/ # CI/CD workflows
You can choose rule-based or llm-based visualization or disable visualization.
# Options: "rule" (rule-based), "llm" (LLM-based), or null (skip visualization)
visualization_mode: llm
You can define basic knowledge and glossary in example/bi.yaml, for example:
basic_knowledge_glossary: |
# Basic Knowledge Introduction
The basic knowledge about your company and its business, including key concepts, metrics, and processes.
# Glossary
Common terms and their definitions used in your business context.
You can provide a brief introduction of your data warehouse in example/bi.yaml, for example:
data_warehouse_introduction: |
# Data Warehouse Introduction
This data warehouse is built on Presto and contains various tables related to XXXXX.
The main fact tables include XXXX metrics, while dimension tables include XXXXX.
The data is updated hourly and is used for reporting and analysis purposes.
You can configure table selection rules in example/bi.yaml, for example:
table_selection_extra_rule: |
- All tables with is_valid can support both valid and invalid traffics
You can define your additional SQL Generation rules for tables in example/table_info.yaml, for example:
sql_rule: |
### SQL Rules
- All event_date in the table are stored in **UTC**. If the user specifies a timezone (e.g., CET, PST), convert between timezones accordingly.
High-quality catalog data is essential for accurate Text2SQL generation and data analysis. OpenChatBI automatically discovers and indexes data warehouse table structures while providing flexible management for business metadata, column descriptions, and query optimization rules.
The catalog system organizes metadata in a hierarchical structure:
Database Level
Table Level
description: Business functionality and purpose of the tableselection_rule: Guidelines for when and how to use this table in queriessql_rule: Specific SQL generation rules and constraints for this tableColumn Level
column_name: Technical database column namedisplay_name: Human-readable name for business usersalias: Alternative names or abbreviationstype: Data type (string, integer, date, etc.)category: Business category, dimension or metrictag: Additional labels for filtering and organizationdescription: Detailed explanation of column purpose and usageOpenChatBI can automatically discover and load table structures from your data warehouse:
Configure automatic catalog loading in your config.yaml:
catalog_store:
store_type: file_system
data_path: ./catalog_data
data_warehouse_config:
include_tables:
- your_table_pattern
# Leave empty to include all accessible tables
The file system catalog store organizes metadata across multiple files for maintainability and version control:
Core Table Information
table_info.yaml: Comprehensive table metadata organized hierarchically (database → table → information)
type: Table classification (e.g., "fact" for Fact Tables, "dimension" for Dimension Tables)description: Business functionality and purposeselection_rule: Usage guidelines in markdown list format (each line starts with -)sql_rule: SQL generation rules in markdown header format (each rule starts with ####)derived_metric: Virtual metrics with calculation formulas, organized by groups:
#### Derived Ratio Metrics
Click-through Rate (alias CTR): SUM(clicks) / SUM(impression)
Conversion Rate (alias CVR): SUM(conversions) / SUM(clicks)
Column Management
table_columns.csv: Basic column registry with schema db_name,table_name,column_nametable_spec_columns.csv: Table-specific column metadata with full schema:
db_name,table_name,column_name,display_name,alias,type,category,tag,descriptioncommon_columns.csv: Shared column definitions across tables with schema:
column_name,display_name,alias,type,category,tag,descriptionQuery Examples and Training Data
table_selection_example.csv: Table selection training examples with schema question,selected_tablessql_example.yaml: Query examples organized by database and table structure:
your_database:
ad_performance: |
Q: Show me CTR trends for the past 7 days
A: SELECT date, SUM(clicks)/SUM(impressions) AS ctr
FROM ad_performance
WHERE date >= CURRENT_DATE - INTERVAL 7 DAY
GROUP BY date
ORDER BY date;
OpenChatBI can integrate with a time series forecasting service for advanced predictive analytics. Follow these steps to set up the service:
See detailed instructions in timeseries_forecasting/README.md
Quick start:
cd timeseries_forecasting
./build_and_run.sh
In your bi.yaml, add constraints for the timeseries_forecast tool, e.g. if you are using timer-base-84m model:
extra_tool_use_rule: |
- timeseries_forecast tool requires at least 96 time points in input data. If no enough input data, set input_len to 96 to pad with zeros.
In your config.yaml:
# Time Series Forecasting Service Configuration
timeseries_forecasting_service_url: "http://localhost:8765"
Important: Adjust the URL based on your deployment scenario:
http://localhost:8765http://your-service-host:8765Test the service is accessible:
curl http://localhost:8765/health
Expected response:
{
"status": "healthy",
"model_initialized": true,
"uptime_seconds": 123.45
}
OpenChatBI supports multiple execution environments for running Python code with different security and performance characteristics:
# Python Code Execution Configuration
python_executor: local # Options: "local", "restricted_local", "docker"
local (Default)
restricted_local
docker
For production deployments or when running untrusted code, the Docker executor provides complete isolation:
python_executor: docker in your configDocker Executor Features:
The project uses modern Python tooling for code quality:
# Format code
uv run black .
# Lint code
uv run ruff check .
# Type checking
uv run mypy openchatbi/
# Security scanning
uv run bandit -r openchatbi/
Run the test suite:
# Run all tests
uv run pytest
# Run with coverage
uv run pytest --cov=openchatbi --cov-report=html
# Run specific test files
uv run pytest test/test_generate_sql.py
uv run pytest test/test_agent_graph.py
Install pre-commit hooks for automatic code quality checks:
uv run pre-commit install
git checkout -b feature/fooBar)git commit -am 'Add some fooBar')git push origin feature/fooBar)This project is licensed under the MIT License - see the LICENSE file for details
FAQs
OpenChatBI - Natural language business intelligence powered by LLMs for intuitive data analysis and SQL generation
We found that openchatbi demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
OpenAI rotated macOS signing certificates after a malicious Axios package reached its CI pipeline in a broader software supply chain attack.

Security News
Open source is under attack because of how much value it creates. It has been the foundation of every major software innovation for the last three decades. This is not the time to walk away from it.

Security News
Socket CEO Feross Aboukhadijeh breaks down how North Korea hijacked Axios and what it means for the future of software supply chain security.