Security News
38% of CISOs Fear They’re Not Moving Fast Enough on AI
CISOs are racing to adopt AI for cybersecurity, but hurdles in budgets and governance may leave some falling behind in the fight against cyber threats.
astro allows rapid and clean development of {Extract, Load, Transform} workflows using Python. It helps DAG authors to achieve more with less code. It is powered by Apache Airflow and maintained by Astronomer.
:warning: Disclaimer This project's development status is alpha. In other words, it is not production-ready yet. The interfaces may change. We welcome alpha users and brave souls to test it - any feedback is welcome.
Astro is available at PyPI. Use the standard Python installation tools.
To install a cloud-agnostic version of Astro, run:
pip install astro-projects
If using cloud providers, install using the optional dependencies of interest:
pip install astro-projects[amazon,google,snowflake,postgres]
After installing Astro, copy the following example dag calculate_popular_movies.py
to a local directory named dags
:
from datetime import datetime
from airflow import DAG
from astro import sql as aql
from astro.sql.table import Table
@aql.transform()
def top_five_animations(input_table: Table):
return """
SELECT Title, Rating
FROM {{input_table}}
WHERE Genre1=='Animation'
ORDER BY Rating desc
LIMIT 5;
"""
with DAG(
"calculate_popular_movies",
schedule_interval=None,
start_date=datetime(2000, 1, 1),
catchup=False,
) as dag:
imdb_movies = aql.load_file(
path="https://raw.githubusercontent.com/astro-projects/astro/main/tests/data/imdb.csv",
task_id="load_csv",
output_table=Table(
table_name="imdb_movies", database="sqlite", conn_id="sqlite_default"
),
)
top_five_animations(
input_table=imdb_movies,
output_table=Table(
table_name="top_animation", database="sqlite", conn_id="sqlite_default"
),
)
Set up a local instance of Airflow by running:
export AIRFLOW_HOME=`pwd`
export AIRFLOW__CORE__ENABLE_XCOM_PICKLING=True
airflow db init
Create an SQLite database for the example to run with and run the DAG:
# The sqlite_default connection has different host for MAC vs. Linux
export SQL_TABLE_NAME=`airflow connections get sqlite_default -o yaml | grep host | awk '{print $2}'`
sqlite3 "$SQL_TABLE_NAME" "VACUUM;"
airflow dags test calculate_popular_movies `date -Iseconds`
Check the top five animations calculated by your first Astro DAG by running:
sqlite3 "$SQL_TABLE_NAME" "select * from top_animation;" ".exit"
You should see the following output:
$ sqlite3 "$SQL_TABLE_NAME" "select * from top_animation;" ".exit"
Toy Story 3 (2010)|8.3
Inside Out (2015)|8.2
How to Train Your Dragon (2010)|8.1
Zootopia (2016)|8.1
How to Train Your Dragon 2 (2014)|7.9
Because astro relies on the Task Flow API and it depends on Apache Airflow >= 2.1.0.
Databases | File types | File locations |
---|---|---|
Google BigQuery | CSV | Amazon S3 |
Postgres | JSON | Filesystem |
Snowflake | NDJSON | Google GCS |
SQLite | Parquet |
A summary of the currently available operations in astro. More details are available in the reference guide.
load_file
: load a given file into a SQL tabletransform
: applies a SQL select statement to a source table and saves the result to a destination tabletruncate
: remove all records from a SQL tablerun_raw_sql
: run any SQL statement without handling its outputappend
: insert rows from the source SQL table into the destination SQL table, if there are no conflictsmerge
: insert rows from the source SQL table into the destination SQL table, depending on conflicts:
save_file
: export SQL table rows into a destination filedataframe
: export given SQL table into in-memory Pandas data-framerender
: given a directory containing SQL statements, dynamically create transform tasks within a DAGThe documentation is a work in progress, and we aim to follow the Diátaxis system:
We follow Semantic Versioning for releases. Check the changelog for the latest changes.
To learn more about our release philosophy and steps, check here
All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.
Read the Contribution Guideline for a detailed overview on how to contribute.
As contributors and maintainers to this project, you should abide by the Contributor Code of Conduct.
FAQs
A decorator that allows users to run SQL queries natively in Airflow.
We found that astro-projects demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
CISOs are racing to adopt AI for cybersecurity, but hurdles in budgets and governance may leave some falling behind in the fight against cyber threats.
Research
Security News
Socket researchers uncovered a backdoored typosquat of BoltDB in the Go ecosystem, exploiting Go Module Proxy caching to persist undetected for years.
Security News
Company News
Socket is joining TC54 to help develop standards for software supply chain security, contributing to the evolution of SBOMs, CycloneDX, and Package URL specifications.