-
Notifications
You must be signed in to change notification settings - Fork 23.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate nonzero
from the TH to Aten (CPU)
#24745
Labels
better-engineering
Relatively self-contained tasks for better engineering contributors
high priority
module: porting
Issues related to porting TH/THNN legacy to ATen native
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Comments
making it high priority because it's useful for complex tensors |
facebook-github-bot
pushed a commit
that referenced
this issue
Jun 2, 2021
Summary: Resubmit of #58811, Closes gh-24745 The existing PR (gh-50655) has been stalled because `TensorIterator` doesn't guarantee iteration order in the same way that `TH_TENSOR_APPLY` does. For contiguous test cases this isn't an issue; but it breaks down for example with channels last format. I resolve this by adding a new `TensorIteratorConfig` parameter, `enforce_linear_iteration`, which disables dimension reordering. I've also added a test case for non-contiguous tensors to verify this works. This PR also significantly improves performance by adding multithreading support to the algorithm. As part of this, I wrote a custom `count_nonzero` that gives per-thread counts which is necessary to write the outputs in the right location. | Shape | Before | After (1 thread) | After (8 threads) | |:----------:|--------:|-----------------:|------------------:| | 256,128,32 | 2610 us | 2150 us | 551 us | | 128,128,32 | 1250 us | 1020 us | 197 us | | 64,128,32 | 581 us | 495 us | 99 us | | 32,128,32 | 292 us | 255 us | 83 us | | 16,128,32 | 147 us | 126 us | 75 us | | 8,128,32 | 75 us | 65 us | 65 us | | 4,128,32 | 39 us | 33 us | 33 us | | 2,128,32 | 20 us | 18 us | 18 us | | 1,128,32 | 11 us | 9 us | 9 us | Pull Request resolved: #59149 Reviewed By: mruberry Differential Revision: D28817466 Pulled By: ngimel fbshipit-source-id: f08f6c003c339368fd53dabd28e9ada9e59de732
deniskokarev
pushed a commit
to deniskokarev/pytorch
that referenced
this issue
Jun 9, 2021
Summary: Closes pytorchgh-24745 The existing PR (pytorchgh-50655) has been stalled because `TensorIterator` doesn't guarantee iteration order in the same way that `TH_TENSOR_APPLY` does. For contiguous test cases this isn't an issue; but it breaks down for example with channels last format. I resolve this by adding a new `TensorIteratorConfig` parameter, `enforce_linear_iteration`, which disables dimension reordering. I've also added a test case for non-contiguous tensors to verify this works. This PR also significantly improves performance by adding multithreading support to the algorithm. As part of this, I wrote a custom `count_nonzero` that gives per-thread counts which is necessary to write the outputs in the right location. | Shape | Before | After (1 thread) | After (8 threads) | |:----------:|--------:|-----------------:|------------------:| | 256,128,32 | 2610 us | 2220 us | 496 us | | 128,128,32 | 1250 us | 976 us | 175 us | | 64,128,32 | 581 us | 486 us | 88 us | | 32,128,32 | 292 us | 245 us | 80 us | | 16,128,32 | 147 us | 120 us | 71 us | | 8,128,32 | 75 us | 61 us | 61 us | | 4,128,32 | 39 us | 32 us | 32 us | | 2,128,32 | 20 us | 17 us | 17 us | | 1,128,32 | 11 us | 9 us | 9 us | Pull Request resolved: pytorch#58811 Reviewed By: anjali411 Differential Revision: D28700259 Pulled By: ngimel fbshipit-source-id: 9b279ca7c36d8e348b7e5e4be0dd159e05aee159
deniskokarev
pushed a commit
to deniskokarev/pytorch
that referenced
this issue
Jun 9, 2021
Summary: Resubmit of pytorch#58811, Closes pytorchgh-24745 The existing PR (pytorchgh-50655) has been stalled because `TensorIterator` doesn't guarantee iteration order in the same way that `TH_TENSOR_APPLY` does. For contiguous test cases this isn't an issue; but it breaks down for example with channels last format. I resolve this by adding a new `TensorIteratorConfig` parameter, `enforce_linear_iteration`, which disables dimension reordering. I've also added a test case for non-contiguous tensors to verify this works. This PR also significantly improves performance by adding multithreading support to the algorithm. As part of this, I wrote a custom `count_nonzero` that gives per-thread counts which is necessary to write the outputs in the right location. | Shape | Before | After (1 thread) | After (8 threads) | |:----------:|--------:|-----------------:|------------------:| | 256,128,32 | 2610 us | 2150 us | 551 us | | 128,128,32 | 1250 us | 1020 us | 197 us | | 64,128,32 | 581 us | 495 us | 99 us | | 32,128,32 | 292 us | 255 us | 83 us | | 16,128,32 | 147 us | 126 us | 75 us | | 8,128,32 | 75 us | 65 us | 65 us | | 4,128,32 | 39 us | 33 us | 33 us | | 2,128,32 | 20 us | 18 us | 18 us | | 1,128,32 | 11 us | 9 us | 9 us | Pull Request resolved: pytorch#59149 Reviewed By: mruberry Differential Revision: D28817466 Pulled By: ngimel fbshipit-source-id: f08f6c003c339368fd53dabd28e9ada9e59de732
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
better-engineering
Relatively self-contained tasks for better engineering contributors
high priority
module: porting
Issues related to porting TH/THNN legacy to ATen native
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Porting TH operators is essential for code simplicity and performance reasons.
Porting guides and Q&A are available in umbrella issue: #24507
Feel free to add @VitalyFedyunin as a reviewer to get a prioritized review.
cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser
The text was updated successfully, but these errors were encountered: