DLT-META
Documentation |
Release Notes |
Examples
Project Overview
DLT-META
is a metadata-driven framework designed to work with Delta Live Tables. This framework enables the automation of bronze and silver data pipelines by leveraging metadata recorded in an onboarding JSON file. This file, known as the Dataflowspec, serves as the data flow specification, detailing the source and target metadata required for the pipelines.
In practice, a single generic DLT pipeline reads the Dataflowspec and uses it to orchestrate and run the necessary data processing workloads. This approach streamlines the development and management of data pipelines, allowing for a more efficient and scalable data processing workflow
Components:
Metadata Interface
Generic DLT pipeline
- Apply appropriate readers based on input metadata
- Apply data quality rules with DLT expectations
- Apply CDC apply changes if specified in metadata
- Builds DLT graph based on input/output metadata
- Launch DLT pipeline
High-Level Process Flow:
Steps
Getting Started
Refer to the Getting Started
Databricks Labs DLT-META CLI lets you run onboard and deploy in interactive python terminal
pre-requisites:
-
Python 3.8.0 +
-
Databricks CLI v0.213 or later. See instructions
-
Install Databricks CLI on macOS:
-
-
Install Databricks CLI on Windows:
-
Once you install Databricks CLI, authenticate your current machine to a Databricks Workspace:
databricks auth login --host WORKSPACE_HOST
To enable debug logs, simply add `--debug` flag to any command.
Installing dlt-meta:
- Install dlt-meta via Databricks CLI:
databricks labs install dlt-meta
Onboard using dlt-meta CLI:
If you want to run existing demo files please follow these steps before running onboard command:
git clone https://github.com/databrickslabs/dlt-meta.git
cd dlt-meta
python -m venv .venv
source .venv/bin/activate
pip install databricks-sdk
dlt_meta_home=$(pwd)
export PYTHONPATH=$dlt_meta_home
databricks labs dlt-meta onboard
Above commands will prompt you to provide onboarding details. If you have cloned dlt-meta git repo then accept defaults which will launch config from demo folder.
- Goto your databricks workspace and located onboarding job under: Workflow->Jobs runs
depoly using dlt-meta CLI:
- Silver DLT
-
-
- Above command will prompt you to provide dlt details. Please provide respective details for schema which you provided in above steps
More questions
Refer to the FAQ
and DLT-META documentation
Project Support
Please note that all projects released under Databricks Labs
are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements
(SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket
relating to any issues arising from the use of these projects.
Any issues discovered through the use of this project should be filed as issues on the Github Repo.
They will be reviewed as time permits, but there are no formal SLAs for support.