
Security News
Attackers Are Hunting High-Impact Node.js Maintainers in a Coordinated Social Engineering Campaign
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.
intugle
Advanced tools
Transform Fragmented Data into Connected Semantic Data Model
Intugle’s GenAI-powered open-source Python library builds a semantic data model over your existing data systems. At its core, it discovers meaningful links and relationships across data assets — enriching them with profiles, classifications, and business glossaries. With this connected knowledge layer, you can enable semantic search and auto-generate queries to create unified data products, making data integration and exploration faster, more accurate, and far less manual.
| Category | Integrations |
|---|---|
| Data Warehouses | Snowflake, Databricks |
| Databases | SQLite, PostgreSQL, SQL Server, MySQL |
| Local | Pandas, DuckDB (CSV, Parquet, Excel) |
The intugle library includes a Streamlit application that provides an interactive web interface for building and visualizing semantic data models.
https://github.com/user-attachments/assets/402c3f3d-baf3-4ece-ba55-4e06437defc5
To use the Streamlit app, install intugle with the streamlit extra:
pip install intugle[streamlit]
You can launch the Streamlit application using the intugle-mcp command or uvx:
intugle-streamlit
# Or using uvx
uvx --from intugle[streamlit] intugle-streamlit
Open the URL provided in your terminal (usually http://localhost:8501) to access the application. For more details, refer to the Streamlit App documentation.
To run the app in a cloud environment like Google Colab, please refer to our Streamlit quickstart notebook.
For Windows and Linux, you can follow these steps. For macOS, please see the additional steps in the macOS section below.
Before installing, it is recommended to create a virtual environment:
python -m venv .venv
source .venv/bin/activate
Then, install the package:
pip install intugle
For macOS users, you may need to install the libomp library:
brew install libomp
If you installed Python using the official installer from python.org, you may also need to install SSL certificates by running the following command in your terminal. Please replace 3.XX with your specific Python version. This step is not necessary if you installed Python using Homebrew.
/Applications/Python\ 3.XX/Install\ Certificates.command
Before running the project, you need to configure a LLM. This is used for tasks like generating business glossaries and predicting links between tables.
You can configure the LLM by setting the following environment variables:
LLM_PROVIDER: The LLM provider and model to use (e.g., openai:gpt-3.5-turbo) following LangChain's conventionsAPI_KEY: Your API key for the LLM provider. The exact name of the variable may vary from provider to provider.Here's an example of how to set these variables in your environment:
export LLM_PROVIDER="openai:gpt-3.5-turbo"
export OPENAI_API_KEY="your-openai-api-key"
For a detailed, hands-on introduction to the project, please see our quickstart notebooks:
| Domain | Notebook | Open in Colab |
|---|---|---|
| Healthcare | quickstart_healthcare.ipynb | |
| Tech Manufacturing | quickstart_tech_manufacturing.ipynb | |
| FMCG | quickstart_fmcg.ipynb | |
| Sports Media | quickstart_sports_media.ipynb | |
| Databricks Unity Catalog [Health Care] | quickstart_healthcare_databricks.ipynb | Databricks Notebook Only |
| Snowflake Horizon Catalog [ FMCG ] | quickstart_fmcg_snowflake.ipynb | Snowflake Notebook Only |
| Native Snowflake with Cortex Analyst [ Tech Manufacturing ] | quickstart_native_snowflake.ipynb | |
| Native Databricks with AI/BI Genie [ Tech Manufacturing ] | quickstart_native_databricks.ipynb | |
| Streamlit App | quickstart_streamlit.ipynb | |
| Conceptual Search | quickstart_conceptual_search.ipynb | |
| Composite Relationships Prediction | quickstart_basketball_composite_links.ipynb |
These datasets will take you through the following steps:
For more detailed information, advanced usage, and tutorials, please refer to our full documentation site.
The core workflow of the project involves using the SemanticModel to build a semantic layer, and then using the DataProduct to generate data products from that layer.
from intugle import SemanticModel
# Define your datasets
datasets = {
"allergies": {"path": "path/to/allergies.csv", "type": "csv"},
"patients": {"path": "path/to/patients.csv", "type": "csv"},
"claims": {"path": "path/to/claims.csv", "type": "csv"},
# ... add other datasets
}
# Build the semantic model
sm = SemanticModel(datasets, domain="Healthcare")
sm.build()
# Access the profiling results
print(sm.profiling_df.head())
# Access the discovered links
print(sm.links_df)
For detailed code examples and a complete walkthrough, please see our quickstart notebooks.
Once the semantic model is built, you can use the DataProduct class to generate unified data products from the semantic layer.
from intugle import DataProduct
# Define an ETL model
etl = {
"name": "top_patients_by_claim_count",
"fields": [
{
"id": "patients.first",
"name": "first_name",
},
{
"id": "patients.last",
"name": "last_name",
},
{
"id": "claims.id",
"name": "number_of_claims",
"category": "measure",
"measure_func": "count"
}
],
"filter": {
"sort_by": [
{
"id": "claims.id",
"alias": "number_of_claims",
"direction": "desc"
}
],
"limit": 10
}
}
# Create a DataProduct and build it
dp = DataProduct()
data_product = dp.build(etl)
# View the data product as a DataFrame
print(data_product.to_df())
The semantic search feature allows you to search for columns in your datasets using natural language. It is built on top of the Qdrant vector database.
For full setup instructions (including Docker commands and environment variables), please refer to the Semantic Search Documentation.
Once you have built the semantic model, you can use the search method to perform a semantic search. The search function returns a pandas DataFrame containing the search results, including the column's profiling metrics, category, table name, and table glossary.
from intugle import SemanticModel
# Define your datasets
datasets = {
"allergies": {"path": "path/to/allergies.csv", "type": "csv"},
"patients": {"path": "path/to/patients.csv", "type": "csv"},
"claims": {"path": "path/to/claims.csv", "type": "csv"},
# ... add other datasets
}
# Build the semantic model
sm = SemanticModel(datasets, domain="Healthcare")
sm.build()
# Perform a semantic search
search_results = sm.search("reason for hospital visit")
# View the search results
print(search_results)
For detailed code examples and a complete walkthrough, please see our quickstart notebooks.
Intugle includes a built-in MCP (Model Context Protocol) server that exposes your semantic layer to AI assistants and LLM-powered clients. Its main purpose is to allow agents to understand your data's structure by using tools like get_tables and get_schema.
Once your semantic model is built, you can start the server with a simple command:
intugle-mcp
This enables AI agents to programmatically interact with your data context. This also enables vibe coding with the library
For detailed instructions on setting up the server and connecting your favorite client, please see our full documentation.
Join our community to ask questions, share your projects, and connect with other users.
Contributions are welcome! Please see the CONTRIBUTING.md file for guidelines.
This project is licensed under the Apache License, Version 2.0. See the LICENSE file for details.
Third-party software notices are available in the NOTICE file.
FAQs
A GenAI-powered Python library for building semantic layers.
We found that intugle demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.

Security News
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.

Security News
Node.js has paused its bug bounty program after funding ended, removing payouts for vulnerability reports but keeping its security process unchanged.