OpenMP Support

Clang fully supports OpenMP 4.5. Clang supports offloading to X86_64, AArch64, PPC64[LE] and has basic support for Cuda devices.

In addition, the LLVM OpenMP runtime libomp supports the OpenMP Tools Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux, Windows, and macOS.

For the list of supported features from OpenMP 5.0 see OpenMP implementation details.

General improvements

Cuda devices support

Directives execution modes

Clang code generation for target regions supports two modes: the SPMD and non-SPMD modes. Clang chooses one of these two modes automatically based on the way directives and clauses on those directives are used. The SPMD mode uses a simplified set of runtime functions thus increasing performance at the cost of supporting some OpenMP features. The non-SPMD mode is the most generic mode and supports all currently available OpenMP features. The compiler will always attempt to use the SPMD mode wherever possible. SPMD mode will not be used if:

  • The target region contains user code (other than OpenMP-specific directives) in between the target and the parallel directives.

Data-sharing modes

Clang supports two data-sharing models for Cuda devices: Generic and Cuda modes. The default mode is Generic. Cuda mode can give an additional performance and can be activated using the -fopenmp-cuda-mode flag. In Generic mode all local variables that can be shared in the parallel regions are stored in the global memory. In Cuda mode local variables are not shared between the threads and it is user responsibility to share the required data between the threads in the parallel regions.

Features not supported or with limited support for Cuda devices

  • Cancellation constructs are not supported.

  • Doacross loop nest is not supported.

  • User-defined reductions are supported only for trivial types.

  • Nested parallelism: inner parallel regions are executed sequentially.

  • Static linking of libraries containing device code is not supported yet.

  • Automatic translation of math functions in target regions to device-specific math functions is not implemented yet.

  • Debug information for OpenMP target regions is supported, but sometimes it may be required to manually specify the address class of the inspected variables. In some cases the local variables are actually allocated in the global memory, but the debug info may be not aware of it.

OpenMP 5.0 Implementation Details

The following table provides a quick overview over various OpenMP 5.0 features and their implementation status. Please contact openmp-dev at lists.llvm.org for more information or if you want to help with the implementation.

Category

Feature

Status

Reviews

loop extension

support != in the canonical loop form

done

D54441

loop extension

#pragma omp loop (directive)

worked on

loop extension

collapse imperfectly nested loop

done

loop extension

collapse non-rectangular nested loop

done

loop extension

C++ range-base for loop

done

loop extension

clause: if for SIMD directives

done

loop extension

inclusive scan extension (matching C++17 PSTL)

done

memory mangagement

memory allocators

done

r341687,r357929

memory mangagement

allocate directive and allocate clause

done

r355614,r335952

OMPD

OMPD interfaces

not upstream

https://github.com/OpenMPToolsInterface/LLVM-openmp/tree/ompd-tests

OMPT

OMPT interfaces

mostly done

thread affinity extension

thread affinity extension

done

task extension

taskloop reduction

done

task extension

task affinity

not upstream

task extension

clause: depend on the taskwait construct

worked on

task extension

depend objects and detachable tasks

done

task extension

mutexinoutset dependence-type for tasks

done

D53380,D57576

task extension

combined taskloop constructs

done

task extension

master taskloop

done

task extension

parallel master taskloop

done

task extension

master taskloop simd

done

task extension

parallel master taskloop simd

done

SIMD extension

atomic and simd constructs inside SIMD code

done

SIMD extension

SIMD nontemporal

done

device extension

infer target functions from initializers

worked on

device extension

infer target variables from initializers

worked on

device extension

OMP_TARGET_OFFLOAD environment variable

done

D50522

device extension

support full ‘defaultmap’ functionality

done

D69204

device extension

device specific functions

done

device extension

clause: device_type

done

device extension

clause: extended device

done

device extension

clause: uses_allocators clause

done

device extension

clause: in_reduction

worked on

r308768

device extension

omp_get_device_num()

worked on

D54342

device extension

structure mapping of references

unclaimed

device extension

nested target declare

done

D51378

device extension

implicitly map ‘this’ (this[:1])

done

D55982

device extension

allow access to the reference count (omp_target_is_present)

done

device extension

requires directive

partial

device extension

clause: unified_shared_memory

done

D52625,D52359

device extension

clause: unified_address

partial

device extension

clause: reverse_offload

unclaimed parts

D52780

device extension

clause: atomic_default_mem_order

done

D53513

device extension

clause: dynamic_allocators

unclaimed parts

D53079

device extension

user-defined mappers

worked on

D56326,D58638,D58523,D58074,D60972,D59474

device extension

mapping lambda expression

done

D51107

device extension

clause: use_device_addr for target data

done

device extension

support close modifier on map clause

done

D55719,D55892

device extension

teams construct on the host device

done

r371553

device extension

support non-contiguous array sections for target update

done

device extension

pointer attachment

unclaimed

device extension

map clause reordering based on map types

unclaimed

atomic extension

hints for the atomic construct

done

D51233

base language

C11 support

done

base language

C++11/14/17 support

done

base language

lambda support

done

misc extension

array shaping

done

D74144

misc extension

library shutdown (omp_pause_resource[_all])

unclaimed parts

D55078

misc extension

metadirectives

worked on

misc extension

conditional modifier for lastprivate clause

done

misc extension

iterator and multidependences

done

misc extension

depobj directive and depobj dependency kind

done

misc extension

user-defined function variants

worked on

D67294, D64095, D71847, D71830

misc extension

pointer/reference to pointer based array reductions

unclaimed

misc extension

prevent new type definitions in clauses

done

memory model extension

memory model update (seq_cst, acq_rel, release, acquire,…)

done

OpenMP 5.1 Implementation Details

The following table provides a quick overview over various OpenMP 5.1 features and their implementation status, as defined in the technical report 8 (TR8). Please contact openmp-dev at lists.llvm.org for more information or if you want to help with the implementation.

Category

Feature

Status

Reviews

atomic extension

‘compare’ clause on atomic construct

worked on

atomic extension

‘fail’ clause on atomic construct

unclaimed

base language

C++ attribute specifier syntax

done

D105648

device extension

‘present’ map type modifier

done

D83061, D83062, D84422

device extension

‘present’ motion modifier

done

D84711, D84712

device extension

‘present’ in defaultmap clause

done

D92427

device extension

map clause reordering reordering based on ‘present’ modifier

unclaimed

device extension

device-specific environment variables

unclaimed

device extension

omp_target_is_accessible routine

unclaimed

device extension

omp_get_mapped_ptr routine

unclaimed

device extension

new async target memory copy routines

unclaimed

device extension

thread_limit clause on target construct

unclaimed

device extension

has_device_addr clause on target construct

unclaimed

device extension

iterators in map clause or motion clauses

unclaimed

device extension

indirect clause on declare target directive

unclaimed

device extension

allow virtual functions calls for mapped object on device

unclaimed

device extension

interop construct

partial

parsing/sema done: D98558, D98834, D98815

device extension

assorted routines for querying interoperable properties

unclaimed

loop extension

Loop tiling transformation

done

D76342

loop extension

Loop unrolling transformation

worked on

loop extension

‘reproducible’/’unconstrained’ modifiers in ‘order’ clause

unclaimed

memory management

alignment extensions for allocate directive and clause

worked on

memory management

new memory management routines

unclaimed

memory management

changes to omp_alloctrait_key enum

unclaimed

memory model extension

seq_cst clause on flush construct

unclaimed

misc extension

‘omp_all_memory’ keyword and use in ‘depend’ clause

unclaimed

misc extension

error directive

unclaimed

misc extension

scope construct

unclaimed

misc extension

routines for controlling and querying team regions

unclaimed

misc extension

changes to ompt_scope_endpoint_t enum

unclaimed

misc extension

omp_display_env routine

unclaimed

misc extension

extended OMP_PLACES syntax

unclaimed

misc extension

OMP_NUM_TEAMS and OMP_TEAMS_THREAD_LIMIT env vars

unclaimed

misc extension

‘target_device’ selector in context specifier

unclaimed

misc extension

begin/end declare variant

done

D71179

misc extension

dispatch construct and function variant argument adjustment

worked on

D99537, D99679

misc extension

assume and assumes directives

worked on

misc extension

nothing directive

unclaimed

misc extension

masked construct and related combined constructs

worked on

D99995, D100514

misc extension

default(firstprivate) & default(private)

partial

firstprivate done: D75591

other

deprecating master construct

unclaimed

OMPT

new barrier types added to ompt_sync_region_t enum

unclaimed

OMPT

async data transfers added to ompt_target_data_op_t enum

unclaimed

OMPT

new barrier state values added to ompt_state_t enum

unclaimed

OMPT

new ‘emi’ callbacks for external monitoring interfaces

unclaimed

task extension

‘strict’ modifier for taskloop construct

unclaimed

task extension

inoutset in depend clause

unclaimed

task extension

nowait clause on taskwait

unclaimed