[ROCm EP] Fix transpose helper for gfx gridsize constraints (#23527)

TedThemistokleous · Ted Themistokleous · ashrit-ms · commit eb75d7a73f6a · 2025-02-11T09:06:20.000-08:00
Remove inline default transposeHelper and ensure we use the proper check via CanUse_hipBlasTransposeHelper_MLFloat16 Related to change in ROCm Onnxruntime repo: ROCm#82 ### Description Required to correctly limit grid size of transpose helper kernel ### Motivation and Context Compile was defaulting to the inline constructor that was removed instead of using the overloaded case with proper checks. Removed the inline default "true" case as this is incorrect for newer AMD cards/targets Co-authored-by: Ted Themistokleous <tedthemistokleous@amd.com>
diff --git a/onnxruntime/core/providers/rocm/shared_inc/fpgeneric.h b/onnxruntime/core/providers/rocm/shared_inc/fpgeneric.h
@@ -501,7 +501,6 @@ inline hipblasStatus_t hipblasTransposeHelper(hipStream_t /*stream*/, hipblasHan
   return hipblasDgeam(handle, transa, transb, m, n, alpha, A, lda, beta, B, ldb, C, ldc);
 }
 
-inline bool CanUse_hipblasTransposeHelper_MLFloat16(int /*m*/, int /*n*/) { return true; }  // CUDA has a limited grid size of 65536, ROCm has higher limits.
 hipblasStatus_t hipblasTransposeHelper(hipStream_t stream, hipblasHandle_t, hipblasOperation_t, hipblasOperation_t, int m, int n, const half*, const half* A, int, const half*, const half*, int, half* C, int);
 
 // copy

Original file line number	Diff line number	Diff line change
`@@ -501,7 +501,6 @@ inline hipblasStatus_t hipblasTransposeHelper(hipStream_t /stream/, hipblasHan`
`501`	`501`	`return hipblasDgeam(handle, transa, transb, m, n, alpha, A, lda, beta, B, ldb, C, ldc);`
`502`	`502`	`}`
`503`	`503`
`504`		`-inline bool CanUse_hipblasTransposeHelper_MLFloat16(int /m/, int /n/) { return true; } // CUDA has a limited grid size of 65536, ROCm has higher limits.`
`505`	`504`	`hipblasStatus_t hipblasTransposeHelper(hipStream_t stream, hipblasHandle_t, hipblasOperation_t, hipblasOperation_t, int m, int n, const half, const half A, int, const half, const half, int, half* C, int);`
`506`	`505`
`507`	`506`	`// copy`