Skip to content

MooreThreads/mutlass

 
 

Repository files navigation

中文版

MUTLASS 0.2.0

MUTLASS 0.2.0 - February 2025

MUTLASS(MUSA Templates for Linear Algebra Subroutines) is a header-only library for implementing high-performance matrix-matrix multiplication (GEMM) within MUSA(Meta-computing Unified System Architecture). It incorporates strategies for hierarchical decomposition and data movement similar to those used to implement muDNN.

See the Quick Start Guide to get started quickly.

Note: MUTLASS uses the CuTe library, introduced in CUTLASS 3.x, as the backend, and thus is incompatible with most implementations of CUTLASS 2.x.

What's New in MUTLASS 0.2.0

MUTLASS 0.2.0 is an update to MUTLASS adding:

  • MP31 Features:
    • Squad-level MMA(SQMMA) and Warp-level MMA primitives with rich data types (TF32/FP16/BF16/FP8/S8 etc.).
    • Tensor Memory Engine(TME) and RobustBufferAccess primitives.
  • New GEMM mainloop and epilogue targeting MP31 architecture that achieve high performance with TME and SQMMA.
  • New tile scheduler to support CTA swizzle for MP31 kernels.
  • New experimental directory housing the implementations that are not yet stable and may have significant changes in the future.
  • New FP8 GEMM with groupwise scaling.
  • Upgrade the backend from CUTLASS/CuTe 3.5.0 to CUTLASS/CuTe 3.6.0.

Minimum requirements:

  • Architecture: Quyuan

  • Compiler: MCC 4.0.0

  • MUSA Toolkit version: 4.0.0

See the CHANGELOG for a detailed listing of releases and updates.

Performance

The above figure shows the relative performance of the tensorop GEMM compared with muDNN. The performance of TF32 data type be futher optimized in the next release.

Documentation

Building MUTLASS

MUTLASS is a header-only template library and does not need to be built to be used by other projects. Client applications should target MUTLASS's include/ directory in their include paths.

MUTLASS unit tests, examples, and utilities can be build with CMake. The minimum version of CMake is given in the QuickStart guide.

Create a build directory within the MUTLASS project, then run CMake. By default MUTLASS will build kernels for MUSA architecture versions 2.2 and 3.1.

About

MUSA Templates for Linear Algebra Subroutines

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 89.1%
  • mupad 7.6%
  • Python 2.0%
  • CMake 1.2%
  • C 0.1%