Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
This package helps in generating the base models and transform them in bulk. For sources with 10+ models, this package will save you a lot of time by generating base models in bulk and transform them for common fields. Using this package is a great way to start your modeling or onboarding new sources.
To use this package, you need dbt installed with a profile configured. You will also need to install the code-gen package from dbt Hub. Add the following to the packages.yml file in your dbt repo and run dbt deps
to install dependencies.
packages:
- package: dbt-labs/codegen
version: 0.4.0
Install the package in the same environment with your dbt installation by running:
pip install dbt-generator
This package should be executed inside your dbt repo.
To generate base models, use the dbt-generator generate
command. This is a wrapper around the codegen
command that will generate the base models. This is especially useful when you have a lot of models, and you want to generate them all at once.
Usage: dbt-generator generate [OPTIONS]
Gennerate base models based on a .yml source
Options:
-s, --source-yml PATH Source .yml file to be used
-o, --output-path PATH Path to write generated models
-m, --model STRING Model name
-c, --custom_prefix. Enter a Custom String Prefix for Model Filename
--model-prefix BOOLEAN optional prefix of source_name + "_" to the resulting modelname.sql to avoid model name collisions across sources
--source-index INTEGER Index of the source to generate base models for
--help Show this message and exit.
dbt-generator generate -s ./models/source.yml -o ./models/staging/source_name/
This will read in the source.yml
file and generate the base models in the staging/source_name
folder. If you have multiple sources defined in your yml
file, use the --source-index
flag to specify which source you want to generate base models for.
For the same source, you often have consistent naming conventions between tables. For example, the created_at
and modified_at
fields are often named the same for all tables. Changing all these fields to common values across different sources is a best practice. However, doing that for all the date columns in 10+ tables is a pain.
With this package, you can write a transforms.yml
file that will be read in (the .yml
file can be named anything). This file will contain the transforms that you want to apply to all the base models. You can just rename the fields in the base models or apply a custom SQL select to the transformed fields.
Usage: dbt-generator transform [OPTIONS]
Transform base models in a directory using a transforms.yml file
Options:
-m, --model-path PATH The path to models
-t, --transforms-path PATH Path to a .yml file containing transformations
-o, --output-path PATH Path to write transformed models to
--drop-metadata BOOLEAN (default=False) optionally drop source columns prefixed with "_" if that designates metadata columns not needed in target
--case-sensitive BOOLEAN (default=False) treat column names as case-sensitive - otherwise force all to lower
--help Show this message and exit.
Supported data warehouse:
Usage: dbt-generator bq-transform/sf-transform [OPTIONS]
Transform base models in a directory for BigQuery source
Options:
-m, --model-path PATH The path to models
-o, --output-path PATH Path to write transformed models to
--drop-metadata BOOLEAN (default=False) optionally drop source columns prefixed with "_" if that designates metadata columns not needed in target
--case-sensitive BOOLEAN (default=False) treat column names as case-sensitive - otherwise force all to lower
--split-columns BOOLEAN Split column names. E.g. currencycode =>
currency_code
--id-as-int BOOLEAN Convert id to int
--convert-timestamp BOOLEAN Convert timestamp to datetime
--help Show this message and exit.
ID:
name: ID
sql: CAST(ID as INT64)
CREATED_TIME:
name: CREATED_AT
UPDATED_TIME:
name: MODIFIED_AT
DATE_START:
name: START_AT
DATE_STOP:
name: STOP_AT
This .yml
file when applied to all models in the staging/source_name
folder will cast all ID
field to INT64 and rename all the date columns to a value in the name
key. For example, CREATED_TIME
will be renamed to CREATED_AT
and DATE_START
will be renamed to START_AT
. If no sql
is provided, the package will just rename the field. If a sql
is provided, the package will execute the SQL and rename the field using the name
key.
dbt-generator transform -m ./models/staging/source_name/ -t ./transforms.yml
This will transform all models in the staging/source_name
folder using the transforms.yml
file. You can also drop the metadata by setting the drop-metadata
flag to true
(dropping columns start with _
). The --case-sensitive
flag will determine if the transforms will use case-sensitive names or not.
Here are some of the limitations of the current release. If you want to contribute, please open an issue or a pull request.
- transformation logic assumes base model contains just a list of column names with no aliases or SQL logic added
*_id
)FAQs
Generate and process base models for dbt
We found that dbt-generator demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.