What is DagsHub?
DagsHub is a platform where machine learning and data science teams can build, manage, and collaborate on their projects.
With DagsHub you can:
- Version code, data, and models in one place. Use the free provided DagsHub storage or connect it to your cloud storage
- Track Experiments using Git, DVC or MLflow, to provide a fully reproducible environment
- Visualize pipelines, data, and notebooks in and interactive, diff-able, and dynamic way
- Label your data directly on the platform using Label Studio
- Share your work with your team members
- Stream and upload your data in an intuitive and easy way, while preserving versioning and structure.
DagsHub is built firmly around open, standard formats for your project. In particular:
Therefore, you can work with DagsHub regardless of your chosen programming language or frameworks.
DagsHub Client API & CLI
This client library is meant to help you get started quickly with DagsHub. It is made up of Experiment tracking and
Direct Data Access (DDA), a component to let you stream and upload your data.
For more details on the different functions of the client, check out the docs segments:
- Installation & Setup
- Data Streaming
- Data Upload
- Experiment Tracking
- Autologging
- Data Engine
Some functionality is supported only in Python.
To read about some of the awesome use cases for Direct Data Access, check out
the relevant doc page.
Installation
pip install dagshub
Direct Data Access (DDA) functionality requires authentication, which you can easily do by running the following command
in your terminal:
dagshub login
Quickstart for Data Streaming
The easiest way to start using DagsHub is via the Python Hooks method. To do this:
- Your DagsHub project,
- Copy the following 2 lines of code into your Python code which accesses your data:
from dagshub.streaming import install_hooks
install_hooks()
- That’s it! You now have streaming access to all your project files.
🤩 Check out this colab to see an example of this Data Streaming work end to end:
Next Steps
You can dive into the expanded documentation, to learn more about data streaming, data upload and
experiment tracking with DagsHub
Analytics
To improve your experience, we collect analytics on client usage. If you want to disable analytics collection,
set the DAGSHUB_DISABLE_ANALYTICS
environment variable to any value.
Made with 🐶 by DagsHub.