Product
Introducing SSO
Streamline your login process and enhance security by enabling Single Sign-On (SSO) on the Socket platform, now available for all customers on the Enterprise plan, supporting 20+ identity providers.
li-airflow-backfill-plugin
An Airflow Backfill Plugin, from Airflow deployed in LinkedIn Infra production, full-fledged backfill feature with manageability and scalability, including UI and APIs.
Readme
This is an Airflow Plugin. It provides full-featured UI and APIs for data backfills in Airflow with manageability and scalability.
We want users to be able run backfills in a scheduled, managed, scalable, and robust way:
These features can be easily added to an Airflow instances since it is an Airflow plugin.
Let's get started by running Airflow and backfill at local docker: (docker is required, refer to Airflow doc for more details in running Airflow in docker):
# in project root folder
# start
docker-compose up
# stop
docker-compose down
To access Airflow web: http://localhost:8080 (user/pass airflow:airflow)
The supported Airflow version is 2.5.3.
Other versions are exected to work. To quickly test other versions:
volumes:
- ${AIRFLOW_PROJ_DIR:-.}/dags:/opt/airflow/dags
- ${AIRFLOW_PROJ_DIR:-.}/logs:/opt/airflow/logs
- ${AIRFLOW_PROJ_DIR:-.}/plugins/linkedin/airflow/backfill:/opt/airflow/plugins/linkedin/airflow/backfill
Option 1: Drop files to Airflow plugins folder
As Apache Airflow doc says, simply drop all the content in the plugins folder in the project root to the $AIRFLOW_HOME/plugins folder of the Airflow instance.
Restart of the Airflow may be needed according the Airflow config to enable the backfill plugin.
Option 2: Install from PyPi
Starting from 1.0.2, Backfill plugin is available in PyPi. After installation, backfill lib will be installed and registered through entry_points in setup.py.
pip install li-airflow-backfill-plugin==1.0.2
Some Dags are needed to make backfill work. After enabling backfill plugin, drop all the content in the dags/backfill_dags folder to the configured Airflow Dags folder (default is $AIRFLOW_HOME/dags) of the Airflow instance.
Restart of the Airflow is not needed.
Run Airflow
After making changes to the source code, you can run the Airflow in local docker as described in Quick Start to test. The logs will appear in the logs folder in the project root, and feel free to add testing Dags to the dags folder.
Unit Test
pytest is used to run unit tests in docker. The test source code is in tests folder and the pytest configure is pytest.ini
Build Image once for all:
# in project root folder
docker build -t airflow-backfill-plugin-tests-1 -f tests.Dockerfile .
Run Unit Test:
# in project root folder
./run_tests.sh
For detailed design, please refer to the Design Doc.
Writing files to Airflow Dags folder
By default, backfill Dag files will be created in dags/backfill_user_dags folder in workers. This limitation may be lifted through backfill store customerization.
Shallow copy
The backfill Dags are shallow copies from the origin Dags, which means if dependencies outside of the Dag definition file change while backfills are running, the actual behavior may change accordingly.
A backfill table is automactically created and leveraged to store backfill meta and status information in the default Airflow database.
No other tables are created or modified by the backfill feature.
Backfill Dag Id Conventions
The Backfill Dag Ids are generated by backfill store. By default, the Id will be origin Dag Id affixed with "backfill" and timestamp.
The backfill Dag Id is customizable by setting AIRFLOW__LI_BACKFILL__BACKFILL_STORE env to new store class. For example:
name: AIRFLOW__LI_BACKFILL__BACKFILL_STORE
value: 'airflow.providers.my_porvider.backfill.backfill_store.MyBackfillStore'
Backfill Dags Persistence
The backfill Dag files by default are persisted to dags/backfill_user_dags folder.
The persistence is customizable, for example, to store Dags through APIs.
Authetication
Backfill, both UI and APIs, is integrated into the existing Airflow authetication model, so they are autheticated as other Airflow UI and APIs. By default and in local docker Airflow instance, username and password are used to autheticate.
Access Control
Airflow RBAC is supported through permission module.
By default, all the backfill permissions are automatically granted to "User" and "Op" roles. This can be customerized through AIRFLOW__LI_BACKFILL__PERMITTED_ROLES. For example:
name: AIRFLOW__LI_BACKFILL__PERMITTED_ROLES
value: 'first_role,second_role'
FAQs
An Airflow Backfill Plugin, from Airflow deployed in LinkedIn Infra production, full-fledged backfill feature with manageability and scalability, including UI and APIs.
We found that li-airflow-backfill-plugin demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Streamline your login process and enhance security by enabling Single Sign-On (SSO) on the Socket platform, now available for all customers on the Enterprise plan, supporting 20+ identity providers.
Security News
Tea.xyz, a crypto project aimed at rewarding open source contributions, is once again facing backlash due to an influx of spam packages flooding public package registries.
Security News
As cyber threats become more autonomous, AI-powered defenses are crucial for businesses to stay ahead of attackers who can exploit software vulnerabilities at scale.