🚀 Big News: Socket Acquires Coana to Bring Reachability Analysis to Every Appsec Team.Learn more
Socket
DemoInstallSign in
Socket

rae-cli

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

rae-cli

CLI tool for scaffolding analytics engineering data stacks.

0.2.10
PyPI
Maintainers
1

RAE - Rapid Analytics Engineering

Perfectly imperfect, for she was a Wildflower

RAE is the first opinionated framework that is purpose-built for the Analytics Engineering community — inspired by the likes of backend web frameworks like Django, Flask and NestJS, but with a Data Engineering twist!

RAE:

  • Empowers teams, solo devs, students and individual engineers to rapidly scaffold a modern analytics engineering stack with nothing more than a few responses to CLI prompts. From zero to fully containerized infrastructure in minutes.
    • Users can also opt to only scaffold 1 or 2 tools versus an entire stack
  • Abstracts away the infrastructure, container and server knowledge required to set most tools up.

All so you can focus on what matters most: modeling, orchestrating, and delivering data.

What RAE Does

Scaffold Tool Docker Configurations

Spin up a project with plug-and-play support for essential data tools:

  • Data Storage:
    • PostgreSQL
    • MySQL
  • Data Modeling:
    • dbt
    • SQL Mesh
  • Orchestration:
    • Airflow
    • Dagster

Auto-Generate settings.py

A clean and extensible settings file inspired by Django — making it easy to pass environment-specific values (ports, credentials, container names, etc.) to every component of your stack.

Auto-Generate docker-compose.yml

  • Connect all services via a shared Docker network.

Frameworks aren't just for web and mobile engineers anymore. RAE gives Analytics Engineers the tools to build, connect and orchestrate their data stack with ease.

Build like a developer. Deploy like an engineer. Let RAE compose your analytics stack.

Who Is RAE For?

  • Analytics Engineers who want to quickly scaffold their required infrastructure.
  • Data Engineers who need to tie various tools together.
  • Data Scientists who have a need for a data tool stack.
  • Individual developers and anyone learning to use analytics/data engineering tools.
  • Teams that want standardization and clarity across their data stack.

How to use RAE

System Dependencies

ToolRequired VersionNotes
Python3.8+Required for the RAE CLI tool
Docker DesktopLatestDocker Desktop (macOS/Windows) or Docker Engine (Linux)
Shellbash / zsh / PowerShellUsed to run CLI and Docker commands
Web BrowserAny modern browserGoogle Chrome recommended for container-based UIs (e.g. Airflow)

CLI Setup Steps

1. Create a Virtual Environment

OSCommand
macOS/Linuxpython3 -m venv local-env
Windowspy -m venv local-env

2. Activate the Virtual Environment

OSCommand
macOS/Linux (bash/zsh)source local-env/bin/activate
Windows (CMD)local-env\Scripts\activate.bat
Windows (PowerShell)local-env\Scripts\Activate.ps1

3. Install RAE CLI:

pip install rae-cli

4. Initialize your project:

rae init

This will take you through a series of prompts and then generate the project_config.json and settings.py files.

After this command completes you will be left with the following project structure:

├── rae
│   └── src
│       ├── airflow
│       │   └── airflow-init.sh
│       ├── dbt
│       │   ├── analyses
│       │   ├── macros
│       │   ├── models
│       │   ├── seeds
│       │   ├── snapshots
│       │   ├── tests
│       │   ├── dbt-init.sh
│       │   ├── dbt.sh
│       │   ├── dbt_project.yml
│       │   └── Dockerfile
│       ├── docker-compose.yml
│       ├── postgres
│       │   └── postgres-init.sh
│       └── settings
│           ├── project_config.json
│           └── settings.py

The above is just an example and assumes you selected postgres as your data storage, dbt as your data modeling and airflow as your orchestration with postgres as the metastore.

5. Open your settings file - {project_name}/src/settings/settings.py

- You need to populate this file with your specific credentials
    - `data_storage` (PostgreSQL or MySQL)
    - `data_modeling` (dbt or SQL Mesh)
    - `data_orchestration` (Airflow or Dagster)
- If you do not do this, the project will be usable, but the project's containers will be built with default values and will NOT BE production ready nor secure.

You are responsible for ensuring your project is secure, is setup properly and is ready for deployment!

6. Generate your docker compose file:

  • cd into your project directory:
cd {project_name}
  • generate your compose file:
rae generate-compose-file
  • generate docker-compose file without changing directories:
rae generate-compose-file --project-name {project_name}

7. Run your project's Docker containers:

Docker must be installed AND running on your host machine or this command will fail So make sure you have Docker Desktop installed and actively running on your machine!

cd {project_name}/src

Then simply start the containers:

docker-compose up -d

This will run the docker containers for each service and link them via a docker network. The process allows for each container to communicate with one another while still ensuring all tools operate in an isolated state.

Current State of Project

Future Implementations

1. Add secondary test coverage to the project:
  - src/cli.py
  - src/data_modeling/dbt_modeling
  - src/data_modeling/sql_mesh_modeling
  - src/data_orchestration/airflow_orchestration.py
  - src/data_orchestration/dagster_orchestration.py
  - src/data_storage/mysql_storage.py
  - src/data_storage/postgresql_storage.py
  - src/generators/docker_compose_generator.py
  
2. Continue iterating on test coverage
  - src/managers/data_modeling_manager.py
  - src/managers/data_orchestration_manager.py
  - src/managers/data_storage_manager.py
  - src/managers/settings_manager.py
  - src/utility/base_manager.py
  - src/utility/base_tool.py
  - src/utility/dockerfile_writer.py
  - src/utility/indented_dumper.py
  - src/utility/shell_script_writer.py
  - src/utility/supported_tools.py
  - src/main.py

3. Add support for additional data storage tools:
  - Snowflake
  - DuckDB?
  - SQL Server?
  - Databricks
    - AWS S3
    - Google Cloud Storage
    - Azure Blob Storage

4. Add support to allow users to scaffold single applications or custom combinations of tool stacks
  - Scenarios:
    - user only needs a data modeling tool
    - user only needs a data modeling tool and a data storage tool
    - user only needs an orchestration tool
    - etc
  - Intent:
    - To allow greater flexibility and provide a wider use-case for the CLI

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts