Migrate `nonzero` from the TH to Aten (CPU) #24745

VitalyFedyunin · 2019-08-16T19:22:38Z

Porting TH operators is essential for code simplicity and performance reasons.

Porting guides and Q&A are available in umbrella issue: #24507

Feel free to add @VitalyFedyunin as a reviewer to get a prioritized review.

cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser

anjali411 · 2021-01-11T16:21:05Z

making it high priority because it's useful for complex tensors

Summary: Resubmit of #58811, Closes gh-24745 The existing PR (gh-50655) has been stalled because `TensorIterator` doesn't guarantee iteration order in the same way that `TH_TENSOR_APPLY` does. For contiguous test cases this isn't an issue; but it breaks down for example with channels last format. I resolve this by adding a new `TensorIteratorConfig` parameter, `enforce_linear_iteration`, which disables dimension reordering. I've also added a test case for non-contiguous tensors to verify this works. This PR also significantly improves performance by adding multithreading support to the algorithm. As part of this, I wrote a custom `count_nonzero` that gives per-thread counts which is necessary to write the outputs in the right location. | Shape | Before | After (1 thread) | After (8 threads) | |:----------:|--------:|-----------------:|------------------:| | 256,128,32 | 2610 us | 2150 us | 551 us | | 128,128,32 | 1250 us | 1020 us | 197 us | | 64,128,32 | 581 us | 495 us | 99 us | | 32,128,32 | 292 us | 255 us | 83 us | | 16,128,32 | 147 us | 126 us | 75 us | | 8,128,32 | 75 us | 65 us | 65 us | | 4,128,32 | 39 us | 33 us | 33 us | | 2,128,32 | 20 us | 18 us | 18 us | | 1,128,32 | 11 us | 9 us | 9 us | Pull Request resolved: #59149 Reviewed By: mruberry Differential Revision: D28817466 Pulled By: ngimel fbshipit-source-id: f08f6c003c339368fd53dabd28e9ada9e59de732

Summary: Closes pytorchgh-24745 The existing PR (pytorchgh-50655) has been stalled because `TensorIterator` doesn't guarantee iteration order in the same way that `TH_TENSOR_APPLY` does. For contiguous test cases this isn't an issue; but it breaks down for example with channels last format. I resolve this by adding a new `TensorIteratorConfig` parameter, `enforce_linear_iteration`, which disables dimension reordering. I've also added a test case for non-contiguous tensors to verify this works. This PR also significantly improves performance by adding multithreading support to the algorithm. As part of this, I wrote a custom `count_nonzero` that gives per-thread counts which is necessary to write the outputs in the right location. | Shape | Before | After (1 thread) | After (8 threads) | |:----------:|--------:|-----------------:|------------------:| | 256,128,32 | 2610 us | 2220 us | 496 us | | 128,128,32 | 1250 us | 976 us | 175 us | | 64,128,32 | 581 us | 486 us | 88 us | | 32,128,32 | 292 us | 245 us | 80 us | | 16,128,32 | 147 us | 120 us | 71 us | | 8,128,32 | 75 us | 61 us | 61 us | | 4,128,32 | 39 us | 32 us | 32 us | | 2,128,32 | 20 us | 17 us | 17 us | | 1,128,32 | 11 us | 9 us | 9 us | Pull Request resolved: pytorch#58811 Reviewed By: anjali411 Differential Revision: D28700259 Pulled By: ngimel fbshipit-source-id: 9b279ca7c36d8e348b7e5e4be0dd159e05aee159

Summary: Resubmit of pytorch#58811, Closes pytorchgh-24745 The existing PR (pytorchgh-50655) has been stalled because `TensorIterator` doesn't guarantee iteration order in the same way that `TH_TENSOR_APPLY` does. For contiguous test cases this isn't an issue; but it breaks down for example with channels last format. I resolve this by adding a new `TensorIteratorConfig` parameter, `enforce_linear_iteration`, which disables dimension reordering. I've also added a test case for non-contiguous tensors to verify this works. This PR also significantly improves performance by adding multithreading support to the algorithm. As part of this, I wrote a custom `count_nonzero` that gives per-thread counts which is necessary to write the outputs in the right location. | Shape | Before | After (1 thread) | After (8 threads) | |:----------:|--------:|-----------------:|------------------:| | 256,128,32 | 2610 us | 2150 us | 551 us | | 128,128,32 | 1250 us | 1020 us | 197 us | | 64,128,32 | 581 us | 495 us | 99 us | | 32,128,32 | 292 us | 255 us | 83 us | | 16,128,32 | 147 us | 126 us | 75 us | | 8,128,32 | 75 us | 65 us | 65 us | | 4,128,32 | 39 us | 33 us | 33 us | | 2,128,32 | 20 us | 18 us | 18 us | | 1,128,32 | 11 us | 9 us | 9 us | Pull Request resolved: pytorch#59149 Reviewed By: mruberry Differential Revision: D28817466 Pulled By: ngimel fbshipit-source-id: f08f6c003c339368fd53dabd28e9ada9e59de732

VitalyFedyunin added better-engineering module: operators module: porting triaged labels Aug 16, 2019

VitalyFedyunin mentioned this issue Aug 16, 2019

Port TH operators to Aten (umbrella issue) #24507

Closed

ifedan mentioned this issue Oct 2, 2019

Migrate nonzero from the TH to Aten(CPU) #27217

Closed

ifedan self-assigned this Oct 11, 2019

ifedan removed their assignment Nov 25, 2019

heitorschueroff self-assigned this Jul 8, 2020

heitorschueroff removed their assignment Oct 5, 2020

mruberry removed the module: operators (deprecated) label Oct 10, 2020

anjali411 added the high priority label Jan 11, 2021

pytorch-probot bot added the triage review label Jan 11, 2021

anjali411 removed the triage review label Jan 11, 2021

kshitij12345 self-assigned this Jan 16, 2021

kshitij12345 mentioned this issue Jan 16, 2021

Migrate nonzero (CPU) to ATen #50655

Closed

anjali411 mentioned this issue Feb 8, 2021

Complex Numbers Support #33152

Closed

peterbell10 self-assigned this May 22, 2021

peterbell10 mentioned this issue May 23, 2021

Migrate nonzero from TH to ATen (CPU) #58811

Closed

facebook-github-bot closed this as completed in 95b1bc1 May 27, 2021

peterbell10 mentioned this issue May 28, 2021

Migrate nonzero from TH to ATen (CPU) #59149

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate `nonzero` from the TH to Aten (CPU) #24745

Migrate `nonzero` from the TH to Aten (CPU) #24745

VitalyFedyunin commented Aug 16, 2019 •

edited by pytorch-probot bot

Loading

anjali411 commented Jan 11, 2021

Migrate nonzero from the TH to Aten (CPU) #24745

Migrate nonzero from the TH to Aten (CPU) #24745

Comments

VitalyFedyunin commented Aug 16, 2019 • edited by pytorch-probot bot Loading

anjali411 commented Jan 11, 2021

Migrate `nonzero` from the TH to Aten (CPU) #24745

Migrate `nonzero` from the TH to Aten (CPU) #24745

VitalyFedyunin commented Aug 16, 2019 •

edited by pytorch-probot bot

Loading