Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

random_() is not supported for bfloat16 CUDA tensors on Windows #33793

Closed
pbelevich opened this issue Feb 26, 2020 · 1 comment
Closed

random_() is not supported for bfloat16 CUDA tensors on Windows #33793

pbelevich opened this issue Feb 26, 2020 · 1 comment
Labels
module: bfloat16 module: cuda Related to torch.cuda, and CUDA support in general module: random Related to random number generation in PyTorch (rng generator) triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@pbelevich
Copy link
Contributor

pbelevich commented Feb 26, 2020

This is a known issue, that requires further investigation, currently calling any random_() method on bfloat16 CUDA tensor on Windows makes CUDA context invalid, all subsequent CUDA calls fail with 'CUDA error: unspecified launch failure'

Assignee, please look for "TODO: https://github.com/pytorch/pytorch/issues/33793" in the source code

cc @ngimel

@pbelevich pbelevich added module: cuda Related to torch.cuda, and CUDA support in general module: random Related to random number generation in PyTorch (rng generator) module: bfloat16 labels Feb 26, 2020
@ngimel ngimel added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 26, 2020
pbelevich added a commit that referenced this issue Feb 26, 2020
This pull request solves 4 problems:

1. Migrates `Tensor.random_()` from TH to ATen, both CPU and CUDA versions.
2. Allows `random_()` to generate full range of 64 bit numbers(including unsigned 64 bit max value).
3. Implements `random_()` for boolean tensors on CUDA.
4. Makes `random_()` template methods to allow using it with custom RNGs.

 It is done by the following changes:

1. Drop TH CPU implementations of `random_()`.
2. Change API of `random_()` to make to argument optional to allow call `random_(from=min_value, to=None)` that generates full 64 bit range numbers.
3. Make three native functions `random_()`, `random_(to)` and `random_(to, from)` which call three different kernels(both CPU and CUDA).
4. Create three random_ kernels(both CPU and CUDA) to handle `random_(no params)`, `random(to, from)` and `random(from=min_value, to=None)` cases.
5. Templatize all random_ implementations and kernels to use them with custom RNGs.
6. Create C++ tests that uses custom RNG and `random_()` templates to check correctness.
7. Create Python tests to cover all `random_()` scenarios with all possible dtypes and devices.

Fixes #24752
Fixes #32510
Fixes #33299
Fixes #33725

Known issues:
#33793 random_() is not supported for bfloat16 CUDA tensors on Windows


Differential Revision: [D20056350](https://our.internmc.facebook.com/intern/diff/D20056350)

[ghstack-poisoned]
pbelevich added a commit that referenced this issue Feb 26, 2020
This pull request solves 4 problems:

1. Migrates `Tensor.random_()` from TH to ATen, both CPU and CUDA versions.
2. Allows `random_()` to generate full range of 64 bit numbers(including unsigned 64 bit max value).
3. Implements `random_()` for boolean tensors on CUDA.
4. Makes `random_()` template methods to allow using it with custom RNGs.

 It is done by the following changes:

1. Drop TH CPU implementations of `random_()`.
2. Change API of `random_()` to make to argument optional to allow call `random_(from=min_value, to=None)` that generates full 64 bit range numbers.
3. Make three native functions `random_()`, `random_(to)` and `random_(to, from)` which call three different kernels(both CPU and CUDA).
4. Create three random_ kernels(both CPU and CUDA) to handle `random_(no params)`, `random(to, from)` and `random(from=min_value, to=None)` cases.
5. Templatize all random_ implementations and kernels to use them with custom RNGs.
6. Create C++ tests that uses custom RNG and `random_()` templates to check correctness.
7. Create Python tests to cover all `random_()` scenarios with all possible dtypes and devices.

Fixes #24752
Fixes #32510
Fixes #33299
Fixes #33725

Known issues:
#33793 random_() is not supported for bfloat16 CUDA tensors on Windows


Differential Revision: [D20056350](https://our.internmc.facebook.com/intern/diff/D20056350)

[ghstack-poisoned]
pbelevich added a commit that referenced this issue Feb 26, 2020
This pull request solves 4 problems:

1. Migrates `Tensor.random_()` from TH to ATen, both CPU and CUDA versions.
2. Allows `random_()` to generate full range of 64 bit numbers(including unsigned 64 bit max value).
3. Implements `random_()` for boolean tensors on CUDA.
4. Makes `random_()` template methods to allow using it with custom RNGs.

 It is done by the following changes:

1. Drop TH CPU implementations of `random_()`.
2. Change API of `random_()` to make to argument optional to allow call `random_(from=min_value, to=None)` that generates full 64 bit range numbers.
3. Make three native functions `random_()`, `random_(to)` and `random_(to, from)` which call three different kernels(both CPU and CUDA).
4. Create three random_ kernels(both CPU and CUDA) to handle `random_(no params)`, `random(to, from)` and `random(from=min_value, to=None)` cases.
5. Templatize all random_ implementations and kernels to use them with custom RNGs.
6. Create C++ tests that uses custom RNG and `random_()` templates to check correctness.
7. Create Python tests to cover all `random_()` scenarios with all possible dtypes and devices.

Fixes #24752
Fixes #32510
Fixes #33299
Fixes #33725

Known issues:
#33793 random_() is not supported for bfloat16 CUDA tensors on Windows


Differential Revision: [D20056350](https://our.internmc.facebook.com/intern/diff/D20056350)

[ghstack-poisoned]
pbelevich added a commit that referenced this issue Feb 26, 2020
This pull request solves 4 problems:

1. Migrates `Tensor.random_()` from TH to ATen, both CPU and CUDA versions.
2. Allows `random_()` to generate full range of 64 bit numbers(including unsigned 64 bit max value).
3. Implements `random_()` for boolean tensors on CUDA.
4. Makes `random_()` template methods to allow using it with custom RNGs.

 It is done by the following changes:

1. Drop TH CPU implementations of `random_()`.
2. Change API of `random_()` to make to argument optional to allow call `random_(from=min_value, to=None)` that generates full 64 bit range numbers.
3. Make three native functions `random_()`, `random_(to)` and `random_(to, from)` which call three different kernels(both CPU and CUDA).
4. Create three random_ kernels(both CPU and CUDA) to handle `random_(no params)`, `random(to, from)` and `random(from=min_value, to=None)` cases.
5. Templatize all random_ implementations and kernels to use them with custom RNGs.
6. Create C++ tests that uses custom RNG and `random_()` templates to check correctness.
7. Create Python tests to cover all `random_()` scenarios with all possible dtypes and devices.

Fixes #24752
Fixes #32510
Fixes #33299
Fixes #33725

Known issues:
#33793 random_() is not supported for bfloat16 CUDA tensors on Windows


Differential Revision: [D20056350](https://our.internmc.facebook.com/intern/diff/D20056350)

[ghstack-poisoned]
pbelevich added a commit that referenced this issue Feb 26, 2020
This pull request solves 4 problems:

1. Migrates `Tensor.random_()` from TH to ATen, both CPU and CUDA versions.
2. Allows `random_()` to generate full range of 64 bit numbers(including unsigned 64 bit max value).
3. Implements `random_()` for boolean tensors on CUDA.
4. Makes `random_()` template methods to allow using it with custom RNGs.

 It is done by the following changes:

1. Drop TH CPU implementations of `random_()`.
2. Change API of `random_()` to make to argument optional to allow call `random_(from=min_value, to=None)` that generates full 64 bit range numbers.
3. Make three native functions `random_()`, `random_(to)` and `random_(to, from)` which call three different kernels(both CPU and CUDA).
4. Create three random_ kernels(both CPU and CUDA) to handle `random_(no params)`, `random(to, from)` and `random(from=min_value, to=None)` cases.
5. Templatize all random_ implementations and kernels to use them with custom RNGs.
6. Create C++ tests that uses custom RNG and `random_()` templates to check correctness.
7. Create Python tests to cover all `random_()` scenarios with all possible dtypes and devices.

Fixes #24752
Fixes #32510
Fixes #33299
Fixes #33725

Known issues:
#33793 random_() is not supported for bfloat16 CUDA tensors on Windows


Differential Revision: [D20056350](https://our.internmc.facebook.com/intern/diff/D20056350)

[ghstack-poisoned]
@peterjc123
Copy link
Collaborator

@pbelevich I solved an issue that may be related to this in #37302. Would you please try again based on that PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: bfloat16 module: cuda Related to torch.cuda, and CUDA support in general module: random Related to random number generation in PyTorch (rng generator) triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants