
Security News
rv Is a New Rust-Powered Ruby Version Manager Inspired by Python's uv
Ruby maintainers from Bundler and rbenv teams are building rv to bring Python uv's speed and unified tooling approach to Ruby development.
DISCLAIMER: This package contains research code. APIs may change.
Megatron Energon is the multi-modal data loader of Megatron (you can also use it independently).
It's best at
Try using it together with Megatron Core.
Megatron Energon is a pip-installable python package that offers
This document is just a quick start. Please also check out the documentation.
To install the latest stable version:
pip install megatron-energon
Or to install the current development version:
pip install git+https://github.com/NVIDIA/Megatron-Energon.git
NOTE: We encourage you to install the package (and not just import a local copy). This will ensure you have all the needed dependencies and that you can use the command line tool.
For more details on installing this package, see here.
After installation, the command energon
will be available.
Here are some examples for things you can do:
Command | Description |
---|---|
energon prepare DATASET_ROOT | Take an existing WebDataset and add the required yaml files to turn it into an energon-compatible dataset |
energon lint DATASET_ROOT | Verify that the dataset complies with the energon dataset format and that all samples are loadable |
To get started, pick a WebDataset-compliant dataset and run energon prepare DATASET_ROOT
on it, to run the interactive assistant and create the .nv-meta
folder. As an alternative to WebDataset, Energon also supports the JSONL format, see here.
Once done, try to load it from your Python program:
from megatron.energon import get_train_dataset, get_loader, WorkerConfig
simple_worker_config = WorkerConfig(rank=0, world_size=1, num_workers=2)
train_ds = get_train_dataset(
'/my/dataset/path',
batch_size=2,
shuffle_buffer_size=None,
max_samples_per_sequence=None,
worker_config=simple_worker_config,
)
train_loader = get_loader(train_ds)
for batch in train_loader:
# Do something with batch
# Infer, gradient step, ...
pass
For more details, read the documentation.
FAQs
Megatron's multi-modal data loader
We found that megatron-energon demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Ruby maintainers from Bundler and rbenv teams are building rv to bring Python uv's speed and unified tooling approach to Ruby development.
Security News
Following last week’s supply chain attack, Nx published findings on the GitHub Actions exploit and moved npm publishing to Trusted Publishers.
Security News
AGENTS.md is a fast-growing open format giving AI coding agents a shared, predictable way to understand project setup, style, and workflows.