Security News
pnpm 10.0.0 Blocks Lifecycle Scripts by Default
pnpm 10 blocks lifecycle scripts by default to improve security, addressing supply chain attack risks but sparking debate over compatibility and workflow changes.
@dataform/core
Advanced tools
@dataform/core is a powerful tool for managing data workflows and transformations. It allows you to define data models, schedule data transformations, and manage dependencies between different data operations. It is particularly useful for teams working with large-scale data warehouses and ETL processes.
Defining Data Models
This feature allows you to define data models using SQL queries. The `table` function is used to create a new table, specifying its type, dependencies, and the SQL query that defines its content.
const dataform = require('@dataform/core');
const { table, ref } = dataform;
table('my_table', {
type: 'table',
dependencies: [ref('source_table')],
query: `
SELECT *
FROM ${ref('source_table')}
`
});
Scheduling Data Transformations
This feature allows you to schedule data transformations using cron syntax. The `schedule` function is used to define a scheduled task, specifying the cron schedule and the actions to be performed.
const dataform = require('@dataform/core');
const { schedule } = dataform;
schedule('daily_update', {
cron: '0 0 * * *',
actions: [
{ name: 'update_table', type: 'operation', query: 'CALL update_table_procedure();' }
]
});
Managing Dependencies
This feature allows you to manage dependencies between different data operations. The `ref` function is used to reference other tables or operations, ensuring that dependencies are correctly managed.
const dataform = require('@dataform/core');
const { ref, table } = dataform;
table('dependent_table', {
type: 'table',
dependencies: [ref('base_table')],
query: `
SELECT *
FROM ${ref('base_table')}
`
});
dbt (data build tool) is a command-line tool that enables data analysts and engineers to transform data in their warehouse more effectively. It is similar to @dataform/core in that it allows you to define data models and manage dependencies, but it also includes features for testing and documentation.
Apache Airflow is an open-source platform to programmatically author, schedule, and monitor workflows. It is more general-purpose than @dataform/core, as it can be used for a wide range of workflow automation tasks beyond just data transformations.
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, and more. Like @dataform/core, it is designed for managing data workflows, but it is more focused on batch processing.
FAQs
Dataform core API.
We found that @dataform/core demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
pnpm 10 blocks lifecycle scripts by default to improve security, addressing supply chain attack risks but sparking debate over compatibility and workflow changes.
Product
Socket now supports uv.lock files to ensure consistent, secure dependency resolution for Python projects and enhance supply chain security.
Research
Security News
Socket researchers have discovered multiple malicious npm packages targeting Solana private keys, abusing Gmail to exfiltrate the data and drain Solana wallets.