Skip to content

Commit eb75d7a

Browse files
TedThemistokleousTed Themistokleous
authored andcommitted
[ROCm EP] Fix transpose helper for gfx gridsize constraints (#23527)
Remove inline default transposeHelper and ensure we use the proper check via CanUse_hipBlasTransposeHelper_MLFloat16 Related to change in ROCm Onnxruntime repo: ROCm#82 ### Description Required to correctly limit grid size of transpose helper kernel ### Motivation and Context Compile was defaulting to the inline constructor that was removed instead of using the overloaded case with proper checks. Removed the inline default "true" case as this is incorrect for newer AMD cards/targets Co-authored-by: Ted Themistokleous <[email protected]>
1 parent 6e1520f commit eb75d7a

File tree

1 file changed

+0
-1
lines changed

1 file changed

+0
-1
lines changed

onnxruntime/core/providers/rocm/shared_inc/fpgeneric.h

-1
Original file line numberDiff line numberDiff line change
@@ -501,7 +501,6 @@ inline hipblasStatus_t hipblasTransposeHelper(hipStream_t /*stream*/, hipblasHan
501501
return hipblasDgeam(handle, transa, transb, m, n, alpha, A, lda, beta, B, ldb, C, ldc);
502502
}
503503

504-
inline bool CanUse_hipblasTransposeHelper_MLFloat16(int /*m*/, int /*n*/) { return true; } // CUDA has a limited grid size of 65536, ROCm has higher limits.
505504
hipblasStatus_t hipblasTransposeHelper(hipStream_t stream, hipblasHandle_t, hipblasOperation_t, hipblasOperation_t, int m, int n, const half*, const half* A, int, const half*, const half*, int, half* C, int);
506505

507506
// copy

0 commit comments

Comments
 (0)