Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular. You can find out more at kedro.org.
A series of lightweight data connectors used to save and load data across many different file formats and file systems, including local and network file systems, cloud object stores, and HDFS. The Data Catalog also includes data and model versioning for file-based systems.
Pipeline Abstraction
Automatic resolution of dependencies between pure Python functions and data pipeline visualisation using Kedro-Viz.
Coding Standards
Test-driven development using pytest, produce well-documented code using Sphinx, create linted code with support for ruff and make use of the standard Python logging library.
Flexible Deployment
Deployment strategies that include single or distributed-machine deployment as well as additional support for deploying on Argo, Prefect, Kubeflow, AWS Batch, and Databricks.
Kedro is built upon our collective best-practice (and mistakes) trying to deliver real-world ML applications that have vast amounts of raw unvetted data. We developed Kedro to achieve the following:
To address the main shortcomings of Jupyter notebooks, one-off scripts, and glue-code because there is a focus on
creating maintainable data science code
To enhance team collaboration when different team members have varied exposure to software engineering concepts
To increase efficiency, because applied concepts like modularity and separation of concerns inspire the creation of
reusable analytics code
There is a growing community around Kedro. We encourage you to ask and answer technical questions on Slack and bookmark the Linen archive of past discussions.
We keep a list of technical FAQs in the Kedro documentation and you can find a growing list of blog posts, videos and projects that use Kedro over on the awesome-kedro GitHub repository. If you have created anything with Kedro we'd love to include it on the list. Just make a PR to add it!
How can I cite Kedro?
If you're an academic, Kedro can also help you, for example, as a tool to solve the problem of reproducible research. Use the "Cite this repository" button on our repository to generate a citation from the CITATION.cff file.
Python version support policy
The core Kedro Framework supports all Python versions that are actively maintained by the CPython core team. When a Python version reaches end of life, support for that version is dropped from Kedro. This is not considered a breaking change.
The Kedro Datasets package follows the NEP 29 Python version support policy. This means that kedro-datasets generally drops Python version support before kedro. This is because kedro-datasets has a lot of dependencies that follow NEP 29 and the more conservative version support approach of the Kedro Framework makes it hard to manage those dependencies properly.
☕️ Kedro Coffee Chat 🔶
We appreciate our community and want to stay connected. For that, we offer a public Coffee Chat format where we share updates and cool stuff around Kedro once every two weeks and give you time to ask your questions live.
Kedro helps you build production-ready data and analytics pipelines
We found that kedro demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago.It has 3 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
OpenSSF has published OSPS Baseline, an initiative designed to establish a minimum set of security-related best practices for open source software projects.
Michigan TypeScript founder Dimitri Mitropoulos implements WebAssembly runtime in TypeScript types, enabling Doom to run after processing 177 terabytes of type definitions.