Research
Security News
Malicious npm Packages Inject SSH Backdoors via Typosquatted Libraries
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
Astro Python SDK is a Python SDK for rapid development of extract, transform, and load workflows in Apache Airflow. It allows you to express your workflows as a set of data dependencies without having to worry about ordering and tasks. The Astro Python SDK is maintained by Astronomer.
The Astro Python SDK is available at PyPI. Use the standard Python installation tools.
To install a cloud-agnostic version of the SDK, run:
pip install astro-sdk-python
You can also install dependencies for using the SDK with popular cloud providers:
pip install astro-sdk-python[amazon,google,snowflake,postgres]
Ensure that your Airflow environment is set up correctly by running the following commands:
export AIRFLOW_HOME=`pwd`
export AIRFLOW__CORE__XCOM_BACKEND=astro.custom_backend.astro_custom_backend.AstroCustomXcomBackend
export AIRFLOW__ASTRO_SDK__STORE_DATA_LOCAL_DEV=true
airflow db init
Note:
AIRFLOW__CORE__ENABLE_XCOM_PICKLING
no longer needs to be enabled forastro-sdk-python
. This functionality is now deprecated as our custom xcom backend handles serialization.
The AIRFLOW__ASTRO_SDK__STORE_DATA_LOCAL_DEV
should only be used for local development. The XCom backend docs give further details about how to set this up in non-local environments.
Currently, custom XCom backends are limited to data types that are json serializable. Since Dataframes are not json serializable, we need to enable XCom pickling to store dataframes.
The data format used by pickle is Python-specific. This has the advantage that there are no restrictions imposed by external standards such as JSON or XDR (which can’t represent pointer sharing); however it means that non-Python programs may not be able to reconstruct pickled Python objects.
Read more: enable_xcom_pickling and pickle:
Create a SQLite database for the example to run with:
# The sqlite_default connection has different host for MAC vs. Linux
export SQL_TABLE_NAME=`airflow connections get sqlite_default -o yaml | grep host | awk '{print $2}'`
sqlite3 "$SQL_TABLE_NAME" "VACUUM;"
Copy the following workflow into a file named calculate_popular_movies.py
and add it to the dags
directory of your Airflow project:
Alternatively, you can download calculate_popular_movies.py
curl -O https://raw.githubusercontent.com/astronomer/astro-sdk/main/python-sdk/example_dags/calculate_popular_movies.py
Run the example DAG:
airflow dags test calculate_popular_movies `date -Iseconds`
Check the result of your DAG by running:
sqlite3 "$SQL_TABLE_NAME" "select * from top_animation;" ".exit"
You should see the following output:
$ sqlite3 "$SQL_TABLE_NAME" "select * from top_animation;" ".exit"
Toy Story 3 (2010)|8.3
Inside Out (2015)|8.2
How to Train Your Dragon (2010)|8.1
Zootopia (2016)|8.1
How to Train Your Dragon 2 (2014)|7.9
Databases |
---|
Databricks Delta |
Google BigQuery |
Postgres |
Snowflake |
SQLite |
Amazon Redshift |
Microsoft SQL |
DuckDB |
File types |
---|
CSV |
JSON |
NDJSON |
Parquet |
File stores |
---|
Amazon S3 |
Filesystem |
Google GCS |
Google Drive |
SFTP |
FTP |
Azure WASB |
Azure WASBS |
The following are some key functions available in the SDK:
load_file
: Load a given file into a SQL tabletransform
: Applies a SQL select statement to a source table and saves the result to a destination tabledrop_table
: Drops a SQL tablerun_raw_sql
: Run any SQL statement without handling its outputappend
: Insert rows from the source SQL table into the destination SQL table, if there are no conflictsmerge
: Insert rows from the source SQL table into the destination SQL table, depending on conflicts:
ignore
: Do not add rows that already existupdate
: Replace existing rows with new onesexport_file
: Export SQL table rows into a destination filedataframe
: Export given SQL table into in-memory Pandas data-frameFor a full list of available operators, see the SDK reference documentation.
The documentation is a work in progress--we aim to follow the Diátaxis system:
The Astro Python SDK follows semantic versioning for releases. Check the changelog for the latest changes.
To learn more about our release philosophy and steps, see Managing Releases.
All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.
Read the Contribution Guideline for a detailed overview on how to contribute.
Contributors and maintainers should abide by the Contributor Code of Conduct.
FAQs
Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
We found that astro-sdk-python demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 4 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Security News
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
Security News
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.