Skip to content

Files

Latest commit

612f8ec · Nov 25, 2024

History

History
372 lines (339 loc) · 53 KB

OpenMPSupport.rst

File metadata and controls

372 lines (339 loc) · 53 KB
.none { background-color: #FFCCCC } .part { background-color: #FFFF99 } .good { background-color: #CCFF99 }

Clang fully supports OpenMP 4.5, almost all of 5.0 and most of 5.1/2. Clang supports offloading to X86_64, AArch64, PPC64[LE], NVIDIA GPUs (all models) and AMD GPUs (all models).

In addition, the LLVM OpenMP runtime libomp supports the OpenMP Tools Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux, Windows, and macOS. OMPT is also supported for NVIDIA and AMD GPUs.

For the list of supported features from OpenMP 5.0 and 5.1 see OpenMP implementation details and OpenMP 51 implementation details.

  • New collapse clause scheme to avoid expensive remainder operations. Compute loop index variables after collapsing a loop nest via the collapse clause by replacing the expensive remainder operation with multiplications and additions.
  • When using the collapse clause on a loop nest the default behavior is to automatically extend the representation of the loop counter to 64 bits for the cases where the sizes of the collapsed loops are not known at compile time. To prevent this conservative choice and use at most 32 bits, compile your program with the -fopenmp-optimistic-collapse.

Clang supports two data-sharing models for Cuda devices: Generic and Cuda modes. The default mode is Generic. Cuda mode can give an additional performance and can be activated using the -fopenmp-cuda-mode flag. In Generic mode all local variables that can be shared in the parallel regions are stored in the global memory. In Cuda mode local variables are not shared between the threads and it is user responsibility to share the required data between the threads in the parallel regions. Often, the optimizer is able to reduce the cost of Generic mode to the level of Cuda mode, but the flag, as well as other assumption flags, can be used for tuning.

  • Cancellation constructs are not supported.
  • Doacross loop nest is not supported.
  • User-defined reductions are supported only for trivial types.
  • Nested parallelism: inner parallel regions are executed sequentially.
  • Debug information for OpenMP target regions is supported, but sometimes it may be required to manually specify the address class of the inspected variables. In some cases the local variables are actually allocated in the global memory, but the debug info may be not aware of it.

The following table provides a quick overview over various OpenMP 5.0 features and their implementation status. Please post on the Discourse forums (Runtimes - OpenMP category) for more information or if you want to help with the implementation.

Category Feature Status Reviews
loop support != in the canonical loop form done D54441
loop #pragma omp loop (directive) partial D145823 (combined forms)
loop #pragma omp loop bind worked on D144634 (needs review)
loop collapse imperfectly nested loop done  
loop collapse non-rectangular nested loop done  
loop C++ range-base for loop done  
loop clause: if for SIMD directives done  
loop inclusive scan (matching C++17 PSTL) done  
memory management memory allocators done r341687,r357929
memory management allocate directive and allocate clause done r355614,r335952
OMPD OMPD interfaces done https://reviews.llvm.org/D99914 (Supports only HOST(CPU) and Linux
OMPT OMPT interfaces (callback support) done  
thread affinity thread affinity done  
task taskloop reduction done  
task task affinity not upstream https://github.com/jklinkenberg/openmp/tree/task-affinity
task clause: depend on the taskwait construct done D113540 (regular codegen only)
task depend objects and detachable tasks done  
task mutexinoutset dependence-type for tasks done D53380,D57576
task combined taskloop constructs done  
task master taskloop done  
task parallel master taskloop done  
task master taskloop simd done  
task parallel master taskloop simd done  
SIMD atomic and simd constructs inside SIMD code done  
SIMD SIMD nontemporal done  
device infer target functions from initializers worked on  
device infer target variables from initializers done D146418
device OMP_TARGET_OFFLOAD environment variable done D50522
device support full 'defaultmap' functionality done D69204
device device specific functions done  
device clause: device_type done  
device clause: extended device done  
device clause: uses_allocators clause done  
device clause: in_reduction worked on r308768
device omp_get_device_num() done D54342,D128347
device structure mapping of references unclaimed  
device nested target declare done D51378
device implicitly map 'this' (this[:1]) done D55982
device allow access to the reference count (omp_target_is_present) done  
device requires directive done  
device clause: unified_shared_memory done D52625,D52359
device clause: unified_address partial  
device clause: reverse_offload partial D52780,D155003
device clause: atomic_default_mem_order done D53513
device clause: dynamic_allocators unclaimed parts D53079
device user-defined mappers done D56326,D58638,D58523,D58074,D60972,D59474
device map array-section with implicit mapper done #101101
device mapping lambda expression done D51107
device clause: use_device_addr for target data done  
device support close modifier on map clause done D55719,D55892
device teams construct on the host device done r371553
device support non-contiguous array sections for target update done  
device pointer attachment done  
atomic hints for the atomic construct done D51233
base language C11 support done  
base language C++11/14/17 support done  
base language lambda support done  
misc array shaping done D74144
misc library shutdown (omp_pause_resource[_all]) done D55078
misc metadirectives mostly done D91944
misc conditional modifier for lastprivate clause done  
misc iterator and multidependences done  
misc depobj directive and depobj dependency kind done  
misc user-defined function variants done. D67294, D64095, D71847, D71830, D109635
misc pointer/reference to pointer based array reductions done  
misc prevent new type definitions in clauses done  
memory model memory model update (seq_cst, acq_rel, release, acquire,...) done  

The following table provides a quick overview over various OpenMP 5.1 features and their implementation status. Please post on the Discourse forums (Runtimes - OpenMP category) for more information or if you want to help with the implementation.

Category Feature Status Reviews
atomic 'compare' clause on atomic construct done D120290, D120007, D118632, D120200, D116261, D118547, D116637
atomic 'fail' clause on atomic construct worked on D123235 (in progress)
base language C++ attribute specifier syntax done D105648
device 'present' map type modifier done D83061, D83062, D84422
device 'present' motion modifier done D84711, D84712
device 'present' in defaultmap clause done D92427
device map clause reordering based on 'present' modifier unclaimed  
device device-specific environment variables unclaimed  
device omp_target_is_accessible routine unclaimed  
device omp_get_mapped_ptr routine done D141545
device new async target memory copy routines done D136103
device thread_limit clause on target construct partial D141540 (offload), D152054 (host, in progress)
device has_device_addr clause on target construct unclaimed  
device iterators in map clause or motion clauses unclaimed  
device indirect clause on declare target directive unclaimed  
device allow virtual functions calls for mapped object on device partial  
device interop construct partial parsing/sema done: D98558, D98834, D98815
device assorted routines for querying interoperable properties partial D106674
loop Loop tiling transformation done D76342
loop Loop unrolling transformation done D99459
loop 'reproducible'/'unconstrained' modifiers in 'order' clause partial D127855
memory management alignment for allocate directive and clause done D115683
memory management 'allocator' modifier for allocate clause done #114883
memory management new memory management routines unclaimed  
memory management changes to omp_alloctrait_key enum unclaimed  
memory model seq_cst clause on flush construct done #114072
misc 'omp_all_memory' keyword and use in 'depend' clause done D125828, D126321
misc error directive done D139166
misc scope construct done D157933, #109197
misc routines for controlling and querying team regions partial D95003 (libomp only)
misc changes to ompt_scope_endpoint_t enum unclaimed  
misc omp_display_env routine done D74956
misc extended OMP_PLACES syntax unclaimed  
misc OMP_NUM_TEAMS and OMP_TEAMS_THREAD_LIMIT env vars done D138769
misc 'target_device' selector in context specifier worked on  
misc begin/end declare variant done D71179
misc dispatch construct and function variant argument adjustment worked on D99537, D99679
misc assumes directives worked on  
misc assume directive done  
misc nothing directive done D123286
misc masked construct and related combined constructs worked on D99995, D100514
misc default(firstprivate) & default(private) done D75591 (firstprivate), D125912 (private)
other deprecating master construct unclaimed  
OMPT new barrier types added to ompt_sync_region_t enum unclaimed  
OMPT async data transfers added to ompt_target_data_op_t enum unclaimed  
OMPT new barrier state values added to ompt_state_t enum unclaimed  
OMPT new 'emi' callbacks for external monitoring interfaces done  
OMPT device tracing interface unclaimed  
task 'strict' modifier for taskloop construct unclaimed  
task inoutset in depend clause done D97085, D118383
task nowait clause on taskwait partial parsing/sema done: D131830, D141531

The following table provides a quick overview over various OpenMP extensions and their implementation status. These extensions are not currently defined by any standard, so links to associated LLVM documentation are provided. As these extensions mature, they will be considered for standardization. Please post on the Discourse forums (Runtimes - OpenMP category) to provide feedback.

Category Feature Status Reviews
atomic extension 'atomic' strictly nested within 'teams' prototyped D126323
device extension 'ompx_hold' map type modifier prototyped D106509, D106510
device extension 'ompx_bare' clause on 'target teams' construct prototyped #66844, #70612
device extension Multi-dim 'num_teams' and 'thread_limit' clause on 'target teams ompx_bare' construct partial #99732, #101407, #102715