OpenCL and Level Zero APIs allow to allocate memory with size restrictions. Maximum allocation size for those APIs can be queried by
clGetDeviceInfo
with param nameCL_DEVICE_MAX_MEM_ALLOC_SIZE
in OpenCLzeDeviceGetProperties
in Level Zero
According to HW architecture, "stateful addressing model" limits maximum allocation size to 4GB. Because of this limitation, default maximum size supported by NEO is 4GB.
It's possible to relax this limitation for both APIs under certain conditions:
- kernel must be compiled in stateless mode Intel Graphics Compiler Build Flags
- memory must be allocated with flag allowing bigger allocation size Creating Allocations Greater Than 4GB
To allocate memory greater than 4GB in Level Zero, it is necessary to pass ze_relaxed_allocation_limits_exp_desc_t
struct to API call that allocates memory.
This structure must be passed by pNext
member of:
-
ze_device_mem_alloc_desc_t
when allocating withzeMemAllocShared
andzeMemAllocDevice
ze_relaxed_allocation_limits_exp_desc_t relaxedAllocationLimitsExpDesc = {ZE_STRUCTURE_TYPE_RELAXED_ALLOCATION_LIMITS_EXP_DESC}; relaxedAllocationLimitsExpDesc.flags |= ZE_RELAXED_ALLOCATION_LIMITS_EXP_FLAG_MAX_SIZE; ze_device_mem_alloc_desc_t deviceDesc = {ZE_STRUCTURE_TYPE_DEVICE_MEM_ALLOC_DESC}; deviceDesc.pNext = &relaxedAllocationLimitsExpDesc; zeMemAllocDevice(hContext, &deviceDesc, size, alignment, hDevice, pptr);
-
ze_host_mem_alloc_desc_t
when allocating withzeMemAllocHost
ze_relaxed_allocation_limits_exp_desc_t relaxedAllocationLimitsExpDesc = {ZE_STRUCTURE_TYPE_RELAXED_ALLOCATION_LIMITS_EXP_DESC}; relaxedAllocationLimitsExpDesc.flags |= ZE_RELAXED_ALLOCATION_LIMITS_EXP_FLAG_MAX_SIZE; ze_host_mem_alloc_desc_t hostDesc = {ZE_STRUCTURE_TYPE_HOST_MEM_ALLOC_DESC}; hostDesc.pNext = &relaxedAllocationLimitsExpDesc; zeMemAllocHost(hContext, &hostDesc, size, alignment, pptr);
Structure ze_relaxed_allocation_limits_exp_desc_t
must have ZE_RELAXED_ALLOCATION_LIMITS_EXP_FLAG_MAX_SIZE
flag set.
To allocate memory greater than 4GB in OpenCL you need to use CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL
flag.
-
For api calls:
clCreateBuffer
clCreateBufferWithProperties
clCreateBufferWithPropertiesINTEL
CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL
flag must be set in passedcl_mem_flags flags
param.cl_mem_flags flags = 0; flags |= CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL; cl_mem buffer = clCreateBuffer(context, flags, size, host_ptr, errcode_ret);
-
For api call:
clSVMAlloc
CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL
flag must be set in passedcl_svm_mem_flags flags
param.cl_svm_mem_flags flags = 0; flags |= CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL; void *alloc = clSVMAlloc(context, flags, size, alignment);
-
For api calls:
clSharedMemAllocINTEL
clDeviceMemAllocINTEL
clHostMemAllocINTEL
CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL
flag must be set incl_mem_flags
orcl_mem_flags_intel
property, incl_mem_properties_intel *properties
param.cl_mem_flags_intel flags = 0; flags |= CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL; cl_mem_properties_intel properties[] = {CL_MEM_FLAGS_INTEL, flags, 0}; void *alloc = clSharedMemAllocINTEL(context, device, properties, size, alignment, errcode_ret);
NEO allows to relax buffer size limitation with Debug Key named AllowUnrestrictedSize
(Works with both APIs)
When set to 1 - maximum allocation size is ignored during buffer creation, despite ZE_RELAXED_ALLOCATION_LIMITS_EXP_FLAG_MAX_SIZE
/CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL
is not passed.
When set to 0 - size restrictions are enforced.
You need to keep in mind that it's only a debug key which is used for driver development and debug process. It's not a part of specification so there is no guarantee that it will work correctly in every case and can be deprecated in any time.
To compile a kernel in stateless addressing model required to allow use of buffers that are bigger than 4GB, following compilation flag must be used:
-ze-opt-greater-than-4GB-buffer-required
This flag must be set in pBuildFlags
member of ze_module_desc_t
that is passed to zeModuleCreate
ze_module_desc_t moduleDesc = {ZE_STRUCTURE_TYPE_MODULE_DESC};
moduleDesc.format = ZE_MODULE_FORMAT_IL_SPIRV;
moduleDesc.pInputModule = moduleData;
moduleDesc.inputSize = moduleSize;
moduleDesc.pBuildFlags = "-ze-opt-greater-than-4GB-buffer-required";
zeModuleCreate(hContext, hDevice, &moduleDesc, phModule, phBuildLog);
-cl-intel-greater-than-4GB-buffer-required
This flag must be set in options
param that is passed to clBuildProgram
const char options[] = "-cl-intel-greater-than-4GB-buffer-required";
clBuildProgram(program, num_devices, device_list, options, callback, user_data);
When above flags are passed, compiler compiles kernels in a stateless addressing model allowing usage of allocations of any size.
https://oneapi-src.github.io/level-zero-spec/level-zero/latest/core/api.html#relaxedalloclimits
https://oneapi-src.github.io/level-zero-spec/level-zero/latest/core/api.html#ze-module-desc-t