Socket
Book a DemoInstallSign in
Socket

dbt-autodoc

Package Overview
Dependencies
Maintainers
1
Versions
22
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

dbt-autodoc

Automated documentation generator for dbt projects using Google Gemini AI

pipPyPI
Version
1.0.21
Maintainers
1

DBT Autodoc Documentation

dbt-autodoc is the ultimate tool for Automated Documentation and Logging for your dbt projects. It combines the power of Google Gemini AI with a robust Database Logging system to ensure your documentation is always up-to-date, accurate, and auditable.

🌟 Why dbt-autodoc?

  • 🤖 Automatic AI Documentation: Generate comprehensive descriptions for your tables and columns automatically.
  • 💾 Database Logging & History: Every description is stored in a database (duckdb or postgres). This acts as a "Source of Truth" and provides a full history of changes.
  • 🔄 Full Synchronization: Seamlessly integrates with dbt-osmosis to keep your YAML files in sync with your SQL models.
  • 🔒 Protect Manual Work: Respects human-written documentation. If you write it, we lock it.
  • 👥 Team Ready: Use Postgres to share documentation cache across your entire team.

🛠️ Setup

  • Install:

    pip install dbt-autodoc
    
  • Configuration: Run dbt-autodoc --help to generate dbt-autodoc.yml. Important: Edit company_context in this file to give the AI knowledge about your business logic.

  • Environment Variables:

    GEMINI_API_KEY=your_api_key_here
    POSTGRES_URL=postgresql://user:pass@host:port/db (optional)
    

For the best results, follow this step-by-step workflow to ensure accuracy and control:

  • Preparation: Update your dbt project, generate the manifest, and context.

    dbt run && dbt docs generate
    # Edit dbt-autodoc.yml with company_context
    
  • Sync Structure (No AI): Regenerate YAML files to match the SQL models. This ensures all new columns are present.

    dbt-autodoc --regenerate-yml
    
  • Generate Model Descriptions (YAML): Generate AI descriptions for your models (tables/views).

    dbt-autodoc --generate-docs-model-ai --model-path models/staging
    
  • Manual Review (Important): Open your YAML files. Review the structure and any existing descriptions. If you manually update a description here, it will be protected from AI overwrites in the next step.

  • Generate Model Column Descriptions (YAML): Use AI to fill in the missing column descriptions.

    dbt-autodoc --generate-docs-model-columns-ai --model-path models/staging
    
  • Propagate & Save: Run inheritance rules on the entire dbt project, then run the tool again to save the final state (including inherited descriptions) to the database.

    dbt-autodoc --regenerate-yml-with-inheritance
    dbt-autodoc --generate-docs-model-columns-ai --model-path models/staging
    
  • Next Layer: Repeat steps 2-6 for models/intermediate, models/marts, etc.

🚀 Quick Start (Automated)

If you trust the process and just want to run everything at once:

dbt-autodoc --generate-docs-ai

🧠 How the AI Works

When generating a description for a column or table, the AI considers multiple inputs to produce the most accurate result:

  • Company Context: The high-level business logic defined in your config.
  • Model SQL: The actual code of the model being documented.
  • Existing Descriptions: Any existing documentation or comments in the file.
  • Upstream Logic: (Implicitly via Osmosis inheritance) Context from upstream models.

It synthesizes all these inputs to write a concise, technical description.

📖 Arguments Reference

ArgumentDescription
--regenerate-ymlStructure Only. Regenerate YAML files from dbt models. Does not sync to DB or call AI.
--regenerate-yml-with-inheritanceStructure + Inheritance. Regenerate YAML files with inheritance enabled. Use this to propagate descriptions from upstream models.
--model-pathRestrict processing to a specific directory (e.g. models/staging).
--generate-docs-model-aiGenerate model descriptions in .yml files using AI.
--generate-docs-model-columns-aiGenerate column descriptions in .yml files using AI.
--generate-docs-modelSync model descriptions in .yml files from cache (no AI).
--generate-docs-model-columnsSync column descriptions in .yml files from cache (no AI).
--generate-docs-ai🔥 Full Auto. Runs the complete workflow: Model generation, YAML sync, and Column generation using AI.
--generate-docs🔄 Full Sync. Runs the complete workflow using only the database cache (no AI).
--cleanup-dbReset Database. Wipes the description cache and history.
--concurrencyMax threads for AI/DB requests (default: 10).
--sort-ymlSort keys in YAML files (name, description, columns for models; name, description for columns).

📄 License

MIT License - see LICENSE for details.

🙏 Attribution

Brought to you by JustDataPlease.

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts