
Security News
TypeScript is Porting Its Compiler to Go for 10x Faster Builds
TypeScript is porting its compiler to Go, delivering 10x faster builds, lower memory usage, and improved editor performance for a smoother developer experience.
nvidia-cusparselt-cu12
Advanced tools
################################################################################### cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication ###################################################################################
NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix:
.. math::
D = Activation(\alpha op(A) \cdot op(B) + \beta op(C) + bias) \cdot scale
where :math:op(A)/op(B)
refers to in-place operations such as transpose/non-transpose, and :math:alpha, beta, scale
are scalars.
The cuSPARSELt APIs allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.
Download: developer.nvidia.com/cusparselt/downloads <https://developer.nvidia.com/cusparselt/downloads>
_
Provide Feedback: Math-Libs-Feedback@nvidia.com <mailto:Math-Libs-Feedback@nvidia.com?subject=cuSPARSELt-Feedback>
_
Examples:
cuSPARSELt Example 1 <https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSELt/matmul>
,
cuSPARSELt Example 2 <https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSELt/matmul_advanced>
Blog post:
Exploiting NVIDIA Ampere Structured Sparsity with cuSPARSELt <https://developer.nvidia.com/blog/exploiting-ampere-structured-sparsity-with-cusparselt/>
_Structured Sparsity in the NVIDIA Ampere Architecture and Applications in Search Engines <https://developer.nvidia.com/blog/structured-sparsity-in-the-nvidia-ampere-architecture-and-applications-in-search-engines/>
__Making the Most of Structured Sparsity in the NVIDIA Ampere Architecture <https://www.nvidia.com/en-us/on-demand/session/gtcspring21-s31552/>
__NVIDIA Sparse MMA tensor core support
Mixed-precision computation support:
+--------------+----------------+-----------------+-------------+
| Input A/B | Input C | Output D | Compute |
+==============+================+=================+=============+
| FP32
| FP32
| FP32
| FP32
|
+--------------+----------------+-----------------+-------------+
| FP16
| FP16
| FP16
| FP32
|
+ + +-------------+
| | | | FP16
|
+--------------+----------------+-----------------+-------------+
| BF16
| BF16
| BF16
| FP32
|
+--------------+----------------+-----------------+-------------+
| INT8
| INT8
| INT8
| INT32
|
+----------------+-----------------+ +
| | INT32
| INT32
| |
+----------------+-----------------+ +
| | FP16
| FP16
| |
+----------------+-----------------+ +
| | BF16
| BF16
| |
+--------------+----------------+-----------------+-------------+
| E4M3
| FP16
| E4M3
| FP32
|
+----------------+-----------------+ +
| | BF16
| E4M3
| |
+----------------+-----------------+ +
| | FP16
| FP16
| |
+----------------+-----------------+ +
| | BF16
| BF16
| |
+----------------+-----------------+ +
| | FP32
| FP32
| |
+--------------+----------------+-----------------+-------------+
| E5M2
| FP16
| E5M2
| FP32
|
+----------------+-----------------+ +
| | BF16
| E5M2
| |
+----------------+-----------------+ +
| | FP16
| FP16
| |
+----------------+-----------------+ +
| | BF16
| BF16
| |
+----------------+-----------------+ +
| | FP32
| FP32
| |
+--------------+----------------+-----------------+-------------+
Matrix pruning and compression functionalities
Activation functions, bias vector, and output scaling
Batched computation (multiple matrices in a single run)
GEMM Split-K mode
Auto-tuning functionality (see cusparseLtMatmulSearch()
)
NVTX ranging and Logging functionalities
SM 8.0
, SM 8.6
, SM 8.9
, SM 9.0
, SM 10.0
, SM 12.0
+------------+--------------------+
| OS | CPU archs |
+============+====================+
| Windows
| x86_64
|
+------------+--------------------+
| Linux
| x86_64
, Arm64
|
+------------+--------------------+
Please refer to https://docs.nvidia.com/cuda/cusparselt/index.html for the cuSPARSELt documentation.
The cuSPARSELt wheel can be installed as follows:
.. code-block:: bash
pip install nvidia-cusparselt-cuXX
where XX is the CUDA major version (currently CUDA 12 only is supported).
FAQs
NVIDIA cuSPARSELt
We found that nvidia-cusparselt-cu12 demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
TypeScript is porting its compiler to Go, delivering 10x faster builds, lower memory usage, and improved editor performance for a smoother developer experience.
Research
Security News
The Socket Research Team has discovered six new malicious npm packages linked to North Korea’s Lazarus Group, designed to steal credentials and deploy backdoors.
Security News
Socket CEO Feross Aboukhadijeh discusses the open web, open source security, and how Socket tackles software supply chain attacks on The Pair Program podcast.