New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More
Socket
Sign inDemoInstall
Socket

nvidia-cusparselt-cu12

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

nvidia-cusparselt-cu12

NVIDIA cuSPARSELt

  • 0.7.0
  • PyPI
  • Socket score

Maintainers
1

################################################################################### cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication ###################################################################################

NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix:

.. math::

D = Activation(\alpha op(A) \cdot op(B) + \beta op(C) + bias) \cdot scale

where :math:op(A)/op(B) refers to in-place operations such as transpose/non-transpose, and :math:alpha, beta, scale are scalars.

The cuSPARSELt APIs allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.

Download: developer.nvidia.com/cusparselt/downloads <https://developer.nvidia.com/cusparselt/downloads>_

Provide Feedback: Math-Libs-Feedback@nvidia.com <mailto:Math-Libs-Feedback@nvidia.com?subject=cuSPARSELt-Feedback>_

Examples: cuSPARSELt Example 1 <https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSELt/matmul>, cuSPARSELt Example 2 <https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSELt/matmul_advanced>

Blog post:

  • Exploiting NVIDIA Ampere Structured Sparsity with cuSPARSELt <https://developer.nvidia.com/blog/exploiting-ampere-structured-sparsity-with-cusparselt/>_
  • Structured Sparsity in the NVIDIA Ampere Architecture and Applications in Search Engines <https://developer.nvidia.com/blog/structured-sparsity-in-the-nvidia-ampere-architecture-and-applications-in-search-engines/>__
  • Making the Most of Structured Sparsity in the NVIDIA Ampere Architecture <https://www.nvidia.com/en-us/on-demand/session/gtcspring21-s31552/>__

================================================================================ Key Features

  • NVIDIA Sparse MMA tensor core support

  • Mixed-precision computation support:

    +--------------+----------------+-----------------+-------------+ | Input A/B | Input C | Output D | Compute | +==============+================+=================+=============+ | FP32 | FP32 | FP32 | FP32 | +--------------+----------------+-----------------+-------------+ | FP16 | FP16 | FP16 | FP32 |

    •          +                +                 +-------------+
      

    | | | | FP16 | +--------------+----------------+-----------------+-------------+ | BF16 | BF16 | BF16 | FP32 | +--------------+----------------+-----------------+-------------+ | INT8 | INT8 | INT8 | INT32 |

    •          +----------------+-----------------+             +
      

    | | INT32 | INT32 | |

    •          +----------------+-----------------+             +
      

    | | FP16 | FP16 | |

    •          +----------------+-----------------+             +
      

    | | BF16 | BF16 | | +--------------+----------------+-----------------+-------------+ | E4M3 | FP16 | E4M3 | FP32 |

    •          +----------------+-----------------+             +
      

    | | BF16 | E4M3 | |

    •          +----------------+-----------------+             +
      

    | | FP16 | FP16 | |

    •          +----------------+-----------------+             +
      

    | | BF16 | BF16 | |

    •          +----------------+-----------------+             +
      

    | | FP32 | FP32 | | +--------------+----------------+-----------------+-------------+ | E5M2 | FP16 | E5M2 | FP32 |

    •          +----------------+-----------------+             +
      

    | | BF16 | E5M2 | |

    •          +----------------+-----------------+             +
      

    | | FP16 | FP16 | |

    •          +----------------+-----------------+             +
      

    | | BF16 | BF16 | |

    •          +----------------+-----------------+             +
      

    | | FP32 | FP32 | | +--------------+----------------+-----------------+-------------+

  • Matrix pruning and compression functionalities

  • Activation functions, bias vector, and output scaling

  • Batched computation (multiple matrices in a single run)

  • GEMM Split-K mode

  • Auto-tuning functionality (see cusparseLtMatmulSearch())

  • NVTX ranging and Logging functionalities

================================================================================ Support

  • Supported SM Architectures: SM 8.0, SM 8.6, SM 8.9, SM 9.0, SM 10.0, SM 12.0
  • Supported CPU architectures and operating systems:

+------------+--------------------+ | OS | CPU archs | +============+====================+ | Windows | x86_64 | +------------+--------------------+ | Linux | x86_64, Arm64 | +------------+--------------------+

================================================================================ Documentation

Please refer to https://docs.nvidia.com/cuda/cusparselt/index.html for the cuSPARSELt documentation.

================================================================================ Installation

The cuSPARSELt wheel can be installed as follows:

.. code-block:: bash

pip install nvidia-cusparselt-cuXX

where XX is the CUDA major version (currently CUDA 12 only is supported).

Keywords

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc