Skip to content

Latest commit

 

History

History
24 lines (19 loc) · 1.67 KB

CHANGELOG.md

File metadata and controls

24 lines (19 loc) · 1.67 KB

MooreThreads MUTLASS Changelog

0.2.0 (2025-02-26)

  • MP31 Features:
    • Squad-level MMA(SQMMA) and Warp-level MMA primitives with rich data types (TF32/FP16/BF16/FP8/S8 etc.).
    • Tensor Memory Engine(TME) and RobustBufferAccess primitives.
  • New GEMM mainloop and epilogue targeting MP31 architecture that achieve high performance with TME and SQMMA.
  • New tile scheduler to support CTA swizzle for MP31 kernels.
  • New experimental directory housing the implementations that are not yet stable and may have significant changes in the future.
  • New FP8 GEMM with groupwise scaling.
  • Upgrade the backend from CUTLASS/CuTe 3.5.0 to CUTLASS/CuTe 3.6.0.

0.1.1 (2024-09-30)

  • MuTe, a core library and backend adapted from CUTLASS CuTe
  • Quyuan Features
    • MMA primitives: TensorFloat32, BFloat16, Float16, INT8
  • FMA/MMA GEMM Kernels targeting the Quyuan architecture
    • Note: this is a beta release. Further updates to MUTLASS will include performance improvements, feature enablement, and possible breaking changes to the API
  • MUTLASS Profiler, Library, and Utilities
  • Two examples that demonstrate the usage of the low-level API and the collective builders to build GEMM kernelS