https://github.com/pytorch/pytorch/blob/b1dbe33056f006ea4d985ad4aa8da128e02569c1/aten/src/ATen/native/Convolution.cpp#L95-L101 performance profiling with and without dilation check is cudnn being used? See pytorch/pytorch#31690