Description
There were some interesting discussions about the behavior of the SYCL device selector on - the now closed - PR #1543.
I think is better if we discuss this particular topic separately and make a decision on what is better for the community.
The current default device selection works like this:
- [A] If there is no CUDA device on the system:
- Give scores to devices (I think: GPU > CPU > FPGA > host)
- Return device with highest score
- [B] If there is a CUDA platform on the system, and no SYCL_BE is passed, or
SYCL_BE=PI_OPENCL
- The OpenCL backend is preferred, so the NVIDIA OpenCL platform GPU is returned first
- User gets invalid triple error, since default compilation doesn't generate the binary format. If using the CUDA triple, still fails with different errors we haven't investigated.
- [C] If there is a CUDA platform on the system, and the SYCL_BE CUDA is passed:
- The CUDA backend is preffered, so the PI CUDA backend is returned.
- The default compilation will fail, since the user hasn't pass the right triple. If the user has passed the right triple for the CUDA backend, the program runs.
On Codeplay side, we get a lot of feedback and user questions from people accidentally running their SYCL application on an NVIDIA OpenCL platform and getting strange failures.
We recommend users to remove the NVIDIA OpenCL icd from the system so its not used on device selection, and only the PI CUDA backend is available for DPC++ applications.
This is not always possible, since not everyone has permission to edit the file. There are environmental flags that can be used when using the Khronos ICD loader, such as the OPENCL_ICD_VENDORS
that alters the path from where the ICD files are loaded, but is not a practical solution.
For that, we propose to remove the NVIDIA OpenCL platform from the device selection all together, so users don't accidentally use a platform that won't work with the default configuration.
We do not want to remove any other OpenCL 1.2 platform since others may work by default (e.g., POCL or ComputeAorta).
After users have dealt with their NVIDIA OpenCL problems, still remains the issue of selecting the CUDA backend.
When the NVIDIA OpenCL platform is removed from the system, and there is no SYCL_BE
preference, the CUDA backend, its selected first on default_selector
(or even in gpu_selector
), since it exposes the GPU. This causes problems to users that are deploying a SYCL application with a default selector on a system that has an NVIDIA GPU, because it will be selected first. This fails if the application has not been built with the right triple for the CUDA backend.
When using the SYCL_BE=PI_OPENCL
, this problems goes away as the CUDA backend is not selected.
To prevent users from accidentally triggering the selection of the NVIDIA CUDA backend, we suggested on the ill-fated PR to make the selection of the CUDA backend explicit. CUDA backend will not be used on default (or GPU) selection unless SYCL_BE=PI_CUDA
is exported.
If it is, then CUDA device will likely be selected first on default or device selection, but user has opted in for this, so its not accidental.
This shouldn't prevent selection of other devices (e.g., when using the cpu_selector
or accelerator_selector
), so multiple devices on the same SYCL application is still possible.
Note that all of this is for the default device selection, and users can still write their own custom device selectors to bypass this. If a user wants to use a CUDA device, she can write a CUDA selector that will force usage of a CUDA device or bail out.
To summarize (and conclude):
- I'll make a PR to expunge NVIDIA OpenCL platform from device selection all together (Won't be possible even to write a custom device selector to use it).
- I'll make a PR to make usage of CUDA backend on default selectors an "opt-in". This won't affect custom selectors.
Does this seems a good approach? Are there any alternatives proposed?