rapids-dependency-file-generator
rapids-dependency-file-generator
is a Python CLI tool that generates conda environment.yaml
files and requirements.txt
files from a single YAML file, typically named dependencies.yaml
.
When installed, it makes the rapids-dependency-file-generator
CLI command available which is responsible for parsing a dependencies.yaml
configuration file and generating the appropriate conda environment.yaml
and requirements.txt
dependency files.
Table of Contents
Installation
rapids-dependency-file-generator
is available on PyPI. To install, run:
pip install rapids-dependency-file-generator
Usage
When rapids-dependency-file-generator
is invoked, it will read a dependencies.yaml
file from the current directory and generate children dependency files.
The dependencies.yaml
file has the following characteristics:
- it is intended to be committed to the root directory of repositories
- it can define matrices that enable the output dependency files to vary according to any arbitrary specification (or combination of specifications), including CUDA version, machine architecture, Python version, etc.
- it contains bifurcated lists of dependencies based on the dependency's purpose (i.e. build, runtime, test, etc.). The bifurcated dependency lists are merged according to the description in the How Dependency Lists Are Merged section below.
dependencies.yaml
Format
The Examples section below has instructions on where example dependency.yaml
files and their corresponding output can be viewed.
The dependencies.yaml
file has three relevant top-level keys: files
, channels
, and dependencies
. These keys are described in detail below.
files
Key
The top-level files
key is responsible for determining the following:
- which types of dependency files should be generated (i.e. conda
environment.yaml
files and/or requirements.txt
files) - where the generated files should be written to (relative to the
dependencies.yaml
file) - which variant files should be generated (based on the provided matrix)
- which of the dependency lists from the top-level
dependencies
key should be included in the generated files
Here is an example of what the files
key might look like:
files:
all:
output: [conda, requirements]
conda_dir: conda/environments
requirements_dir: python/cudf
pyproject_dir: python/cudf
matrix:
cuda: ["11.5", "11.6"]
arch: [x86_64]
includes:
- build
- test
- runtime
build:
output: requirements
conda_dir: conda/environments
requirements_dir: python/cudf
matrix:
cuda: ["11.5"]
arch: [x86_64]
py: ["3.8"]
includes:
- build
The result of the above configuration is that the following dependency files would be generated:
conda/environments/all_cuda-115_arch-x86_64.yaml
conda/environments/all_cuda-116_arch-x86_64.yaml
python/cudf/requirements_all_cuda-115_arch-x86_64.txt
python/cudf/requirements_all_cuda-116_arch-x86_64.txt
python/cudf/requirements_build_cuda-115_arch-x86_64_py-38.txt
The all*.yaml
and requirements_all*.txt
files would include the contents of the build
, test
, and runtime
dependency lists from the top-level dependency
key. The requirements_build*.txt
file would only include the contents of the build
dependency list from the top-level dependency
key.
The value of output
can also be none
as shown below.
files:
test:
output: none
includes:
- test
When output: none
is used, the conda_dir
, requirements_dir
and matrix
keys can be omitted. The use case for output: none
is described in the Additional CLI Notes section below.
A given file may include an extras
entry that may be used to provide inputs specific to a particular file type
Here is an example:
files:
build:
output: pyproject
includes:
- build
extras:
table: table_name
key: key_name
Currently the supported extras by file type are:
- pyproject.toml
- table: The table in pyproject.toml where the dependencies should be written. Acceptable values are "build-system", "project", and "project.optional-dependencies".
- key: The key corresponding to the dependency list in
table
. This may only be provided for the "project.optional-dependencies" table since the key name is fixed for "build-system" ("requires") and "project" ("dependencies"). Note that this implicitly prohibits including optional dependencies via an inline table under the "project" table.
channels
Key
The top-level channels
key specifies the channels that should be included in any generated conda environment.yaml
files.
It might look like this:
channels:
- rapidsai
- conda-forge
In the absence of a channels
key, some sensible defaults for RAPIDS will be used (see constants.py).
dependencies
Key
The top-level dependencies
key is where the bifurcated dependency lists should be specified.
Underneath the dependencies
key are sets of key-value pairs. For each pair, the key can be arbitarily named, but should match an item from the includes
list of any files
entry.
The value of each key-value pair can have the following children keys:
common
- contains dependency lists that are the same across all matrix variationsspecific
- contains dependency lists that are specific to a particular matrix combination
The values of each of these keys are described in detail below.
common
Key
The common
key contains a list of objects with the following keys:
output_types
- a list of output types (e.g. "conda" for environment.yaml
files or "requirements" for requirements.txt
files) for the packages in the packages
keypackages
- a list of packages to be included in the generated output file
specific
Key
The specific
key contains a list of objects with the following keys:
output_types
- same as output_types
for the common
key abovematrices
- a list of objects (described below) which define packages that are specific to a particular matrix combination
matrices
Key
Each list item under the matrices
key contains a matrix
key and a packages
key.
The matrix
key is used to define which matrix combinations from files.[*].matrix
will use the associated packages.
The packages
key is a list of packages to be included in the generated output file for a matching matrix.
This is elaborated on in How Dependency Lists Are Merged.
An example of the above structure is exemplified below:
dependencies:
build:
common:
- output_types: [conda, requirements]
packages:
- common_build_dep
- output_types: conda
packages:
- cupy
- pip:
- some_random_dep
specific:
- output_types: conda
matrices:
- matrix:
cuda: "11.5"
packages:
- cudatoolkit=11.5
- matrix:
cuda: "11.6"
packages:
- cudatoolkit=11.6
- matrix:
packages:
- cudatoolkit
- output_types: [conda, requirements]
matrices:
- matrix:
cuda: "11.5"
arch: x86_64
packages:
- a_random_x86_115_specific_dep
- matrix:
packages:
- output_types: requirements
matrices:
- matrix:
cuda: "11.5"
packages:
- another_random_dep=11.5.0
- matrix:
cuda: "11.6"
packages:
- another_random_dep=11.6.0
test:
common:
- output_types: [conda, requirements]
packages:
- pytest
How Dependency Lists Are Merged
The information from the top-level files
and dependencies
keys are used to determine which dependencies should be included in the final output of the generated dependency files.
Consider the following top-level files
key configuration:
files:
all:
output: conda
conda_dir: conda/environments
requirements_dir: python/cudf
matrix:
cuda: ["11.5", "11.6"]
arch: [x86_64]
includes:
- build
- test
In this example, rapids-dependency-file-generator
will generate two conda environment files: conda/environments/all_cuda-115_arch-x86_64.yaml
and conda/environments/all_cuda-116_arch-x86_64.yaml
.
Since the output
value is conda
, rapids-dependency-file-generator
will iterate through any dependencies.build.common
and dependencies.test.common
list entries and use the packages
of any entry whose output_types
key is conda
or [conda, ...]
.
Further, for the 11.5
and x86_64
matrix combination, any build.specific
and test.specific
list items whose output includes conda
and whose matrices
list items matches any of the definitions below would also be merged:
specific:
- output_types: conda
matrices:
- matrix:
cuda: "11.5"
packages:
- some_dep1
- some_dep2
specific:
- output_types: conda
matrices:
- matrix:
cuda: "11.5"
arch: "x86_64"
packages:
- some_dep1
- some_dep2
specific:
- output_types: conda
matrices:
- matrix:
arch: "x86_64"
packages:
- some_dep1
- some_dep2
Every matrices
list must have a match for a given input matrix (only the first matching matrix in the list of matrices
will be used).
If no matches are found for a particular matrix combination, an error will be thrown.
In instances where an error should not be thrown, an empty matrix
and packages
list item can be used:
- output_types: conda
matrices:
- matrix:
cuda: "11.5"
arch: x86_64
py: "3.8"
packages:
- a_very_specific_115_x86_38_dep
- matrix:
packages:
Merged dependency lists are sorted and deduped.
Additional CLI Notes
Invoking rapids-dependency-file-generator
without any arguments is meant to be the default behavior for RAPIDS developers. It will generate all of the necessary dependency files as specified in the top-level files
configuration.
However, there are CLI arguments that can augment the files
configuration values before the files are generated.
Consider the example when output: none
is used:
files:
test:
output: none
includes:
- test
The test
file generated by the configuration above is useful for CI, but it might not make sense to necessarily commit those files to a repository. In such a scenario, the following CLI arguments can be used:
ENV_NAME="cudf_test"
rapids-dependency-file-generator \
--file-key "test" \
--output "conda" \
--matrix "cuda=12.5;arch=$(arch)" > env.yaml
mamba env create --file env.yaml
mamba activate "$ENV_NAME"
The --file-key
argument is passed the test
key name from the files
configuration. Additional flags are used to generate a single dependency file. When the CLI is used in this fashion, it will print to stdout
instead of writing the resulting contents to the filesystem.
The --file-key
, --output
, and --matrix
flags must be used together. --matrix
may be an empty string if the file that should be generated does not depend on any specific matrix variations.
Where multiple values for the same key are passed to --matrix
, e.g. cuda_suffixed=true;cuda_suffixed=false
, only the last value will be used.
Where --file-key
is supplied multiple times in the same invocation, the output printed to stdout
will contain a union (without duplicates) of all of the corresponding dependencies. For example:
rapids-dependency-file-generator \
--file-key "test" \
--file-key "test_notebooks" \
--output "conda" \
--matrix "cuda=12.5;arch=$(arch)" > env.yaml
The --prepend-channel
argument accepts additional channels to use, like rapids-dependency-file-generator --prepend-channel my_channel --prepend-channel my_other_channel
.
If both --output
and --prepend-channel
are provided, the output format must be conda.
Prepending channels can be useful for adding local channels with packages to be tested in CI workflows.
Running rapids-dependency-file-generator -h
will show the most up-to-date CLI arguments.
Examples
The tests/examples directory has example dependencies.yaml
files along with their corresponding output files.
To create new example
tests do the following:
- Create a new directory with a
dependencies.yaml
file in tests/examples - Ensure the
output
directories (e.g. conda_dir
, requirements_dir
, etc.) are set to write to output/actual
- Run
rapids-dependency-file-generator --config tests/examples/<new_folder_name>/dependencies.yaml
to generate the initial output files - Manually inspect the generated files for correctness
- Copy the contents of
output/actual
to output/expected
, so it will be committed to the repository and used as a baseline for future changes - Add the new folder name to test_examples.py