Skip to content

Commit fe4f19e

Browse files
xwang233facebook-github-bot
authored andcommittedJul 30, 2020
[CUDA] max_pool2d NCHW performance improvement (pytorch#42182)
Summary: Fix the regression introduced in pytorch#38953. Please see https://github.com/xwang233/code-snippet/blob/master/max-pool2d-nchw-perf/max-pool2d.ipynb for detailed before & after performance comparisons. Performance improvement for backward max_pool2d before and after this PR (negative value means speed up) ![image](https://user-images.githubusercontent.com/24860335/88712204-363c8e00-d0ce-11ea-8586-057e09b16103.png) Seems like the forward modulo doesn't benefit much from a similar change, so I did not change forward. pytorch@1718f0c Pull Request resolved: pytorch#42182 Reviewed By: albanD Differential Revision: D22829498 Pulled By: ngimel fbshipit-source-id: 4c81968fe072f4e264e70c70ade4c32d760a3af4
1 parent c18223f commit fe4f19e

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed
 

‎aten/src/ATen/native/cuda/DilatedMaxPool2d.cu

+1-1
Original file line numberDiff line numberDiff line change
@@ -175,7 +175,7 @@ __global__ void max_pool_backward_nchw(const int nthreads, const scalar_t* top_d
175175
scalar_t* bottom_diff) {
176176
CUDA_KERNEL_LOOP(index, height*width) {
177177
int h = index / width;
178-
int w = index % width;
178+
int w = index - h * width;
179179
int phstart = p_start(h, pad_h, kernel_h, dilation_h, stride_h);
180180
int phend = p_end(h, pad_h, pooled_height, stride_h);
181181
int pwstart = p_start(w, pad_w, kernel_w, dilation_w, stride_w);

0 commit comments

Comments
 (0)
Please sign in to comment.