###############################################################
cuTENSOR: A High-Performance CUDA Library For Tensor Primitives
###############################################################
cuTENSOR <https://developer.nvidia.com/cutensor>
_ is a high-performance CUDA library for tensor primitives.
Key Features
-
Extensive mixed-precision support:
- FP64 inputs with FP32 compute.
- FP32 inputs with FP16, BF16, or TF32 compute.
- Complex-times-real operations.
- Conjugate (without transpose) support.
-
Support for up to 64-dimensional tensors.
-
Arbitrary data layouts.
-
Trivially serializable data structures.
-
Main computational routines:
-
Direct (i.e., transpose-free) tensor contractions.
- Support just-in-time compilation of dedicated kernels.
-
Tensor reductions (including partial reductions).
-
Element-wise tensor operations:
- Support for various activation functions.
- Support for padding of the output tensor
- Arbitrary tensor permutations.
- Conversion between different data types.
Documentation
Please refer to https://docs.nvidia.com/cuda/cutensor/index.html for the cuTENSOR documentation.
Installation
The cuTENSOR wheel can be installed as follows:
.. code-block:: bash
pip install cutensor-cuXX
where XX is the CUDA major version (currently CUDA 11 & 12 are supported).
The package cutensor
(without the -cuXX
suffix) is deprecated. If you have
cutensor
installed, please remove it prior to installing cutensor-cuXX
.