Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

airflow-provider-lakefs

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

airflow-provider-lakefs

A lakeFS provider package built by Treeverse.

  • 0.48.0
  • PyPI
  • Socket score

Maintainers
1

Apache license Provider test status PyPI version Code of Conduct

lakeFS airflow provider

lakeFS airflow provider enables a smooth integration of lakeFS with airflow's DAGs. "Use the lakeFS provider to create branches, commit objects, wait for files to be written, and more."

For usage example, check out the example DAG

What is lakeFS

lakeFS is an open source layer that delivers resilience and manageability to object-storage based data lakes.

With lakeFS you can build repeatable, atomic and versioned data lake operations - from complex ETL jobs to data science and analytics.

lakeFS supports AWS S3, Azure Blob Storage and Google Cloud Storage as its underlying storage service. It is API compatible with S3, and works seamlessly with all modern data frameworks such as Spark, Hive, AWS Athena, Presto, etc.

For more information see the official lakeFS documentation.

Capabilities

Development Environment for Data

  • Experimentation - try tools, upgrade versions and evaluate code changes in isolation.
  • Reproducibility - go back to any point of time to a consistent version of your data lake.

Continuous Data Integration

  • Ingest new data safely by enforcing best practices - make sure new data sources adhere to your lake’s best practices such as format and schema enforcement, naming convention, etc.
  • Metadata validation - prevent breaking changes from entering the production data environment.

Continuous Data Deployment

  • Instantly revert changes to data - if low quality data is exposed to your consumers, you can revert instantly to a former, consistent and correct snapshot of your data lake.
  • Enforce cross collection consistency - provide to consumers several collections of data that must be synchronized, in one atomic, revertible, action.
  • Prevent data quality issues by enabling
    • Testing of production data before exposing it to users / consumers.
    • Testing of intermediate results in your DAG to avoid cascading quality issues.

Publishing

The repository include GitHub workflow that is trigger on publish event and will build and push the package to PyPI.

Use the following steps to release:

  • Update setup.py with the new package version
  • Update CHANGELOG.md with changes for the new release
  • Use GitHub release, use semver vX.X.X

Community

Stay up to date and get lakeFS support via:

  • Slack (to get help from our team and other users).
  • Twitter (follow for updates and news)
  • YouTube (learn from video tutorials)
  • Contact us (for anything)

More information

Licensing

lakeFS is completely free and open source and licensed under the Apache 2.0 License.

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc