OpenMP Support

Clang fully supports OpenMP 4.5. Clang supports offloading to X86_64, AArch64, PPC64[LE] and has basic support for Cuda devices.

In addition, the LLVM OpenMP runtime libomp supports the OpenMP Tools Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux, Windows, and macOS.

For the list of supported features from OpenMP 5.0 see OpenMP implementation details.

General improvements

Cuda devices support

Directives execution modes

Clang code generation for target regions supports two modes: the SPMD and non-SPMD modes. Clang chooses one of these two modes automatically based on the way directives and clauses on those directives are used. The SPMD mode uses a simplified set of runtime functions thus increasing performance at the cost of supporting some OpenMP features. The non-SPMD mode is the most generic mode and supports all currently available OpenMP features. The compiler will always attempt to use the SPMD mode wherever possible. SPMD mode will not be used if:

  • The target region contains user code (other than OpenMP-specific directives) in between the target and the parallel directives.

Data-sharing modes

Clang supports two data-sharing models for Cuda devices: Generic and Cuda modes. The default mode is Generic. Cuda mode can give an additional performance and can be activated using the -fopenmp-cuda-mode flag. In Generic mode all local variables that can be shared in the parallel regions are stored in the global memory. In Cuda mode local variables are not shared between the threads and it is user responsibility to share the required data between the threads in the parallel regions.

Features not supported or with limited support for Cuda devices

  • Cancellation constructs are not supported.

  • Doacross loop nest is not supported.

  • User-defined reductions are supported only for trivial types.

  • Nested parallelism: inner parallel regions are executed sequentially.

  • Automatic translation of math functions in target regions to device-specific math functions is not implemented yet.

  • Debug information for OpenMP target regions is supported, but sometimes it may be required to manually specify the address class of the inspected variables. In some cases the local variables are actually allocated in the global memory, but the debug info may be not aware of it.

OpenMP 5.0 Implementation Details

The following table provides a quick overview over various OpenMP 5.0 features and their implementation status. Please post on the Discourse forums (Runtimes - OpenMP category) for more information or if you want to help with the implementation.

Category

Feature

Status

Reviews

loop

support != in the canonical loop form

done

D54441

loop

#pragma omp loop (directive)

worked on

loop

#pragma omp loop bind

worked on

loop

collapse imperfectly nested loop

done

loop

collapse non-rectangular nested loop

done

loop

C++ range-base for loop

done

loop

clause: if for SIMD directives

done

loop

inclusive scan (matching C++17 PSTL)

done

memory management

memory allocators

done

r341687,r357929

memory management

allocate directive and allocate clause

done

r355614,r335952

OMPD

OMPD interfaces

not upstream

https://github.com/OpenMPToolsInterface/LLVM-openmp/tree/ompd-tests

OMPT

OMPT interfaces

mostly done

thread affinity

thread affinity

done

task

taskloop reduction

done

task

task affinity

not upstream

https://github.com/jklinkenberg/openmp/tree/task-affinity

task

clause: depend on the taskwait construct

mostly done

D113540 (regular codegen only)

task

depend objects and detachable tasks

done

task

mutexinoutset dependence-type for tasks

done

D53380,D57576

task

combined taskloop constructs

done

task

master taskloop

done

task

parallel master taskloop

done

task

master taskloop simd

done

task

parallel master taskloop simd

done

SIMD

atomic and simd constructs inside SIMD code

done

SIMD

SIMD nontemporal

done

device

infer target functions from initializers

worked on

device

infer target variables from initializers

done

D146418

device

OMP_TARGET_OFFLOAD environment variable

done

D50522

device

support full ‘defaultmap’ functionality

done

D69204

device

device specific functions

done

device

clause: device_type

done

device

clause: extended device

done

device

clause: uses_allocators clause

done

device

clause: in_reduction

worked on

r308768

device

omp_get_device_num()

done

D54342,D128347

device

structure mapping of references

unclaimed

device

nested target declare

done

D51378

device

implicitly map ‘this’ (this[:1])

done

D55982

device

allow access to the reference count (omp_target_is_present)

done

device

requires directive

partial

device

clause: unified_shared_memory

done

D52625,D52359

device

clause: unified_address

partial

device

clause: reverse_offload

partial

D52780,D155003

device

clause: atomic_default_mem_order

done

D53513

device

clause: dynamic_allocators

unclaimed parts

D53079

device

user-defined mappers

worked on

D56326,D58638,D58523,D58074,D60972,D59474

device

mapping lambda expression

done

D51107

device

clause: use_device_addr for target data

done

device

support close modifier on map clause

done

D55719,D55892

device

teams construct on the host device

done

r371553

device

support non-contiguous array sections for target update

done

device

pointer attachment

unclaimed

device

map clause reordering based on map types

unclaimed

atomic

hints for the atomic construct

done

D51233

base language

C11 support

done

base language

C++11/14/17 support

done

base language

lambda support

done

misc

array shaping

done

D74144

misc

library shutdown (omp_pause_resource[_all])

unclaimed parts

D55078

misc

metadirectives

worked on

D91944

misc

conditional modifier for lastprivate clause

done

misc

iterator and multidependences

done

misc

depobj directive and depobj dependency kind

done

misc

user-defined function variants

worked on

D67294, D64095, D71847, D71830, D109635

misc

pointer/reference to pointer based array reductions

unclaimed

misc

prevent new type definitions in clauses

done

memory model

memory model update (seq_cst, acq_rel, release, acquire,…)

done

OpenMP 5.1 Implementation Details

The following table provides a quick overview over various OpenMP 5.1 features and their implementation status, as defined in the technical report 8 (TR8). Please post on the Discourse forums (Runtimes - OpenMP category) for more information or if you want to help with the implementation.

Category

Feature

Status

Reviews

atomic

‘compare’ clause on atomic construct

done

D120290, D120007, D118632, D120200, D116261, D118547, D116637

atomic

‘fail’ clause on atomic construct

worked on

base language

C++ attribute specifier syntax

done

D105648

device

‘present’ map type modifier

done

D83061, D83062, D84422

device

‘present’ motion modifier

done

D84711, D84712

device

‘present’ in defaultmap clause

done

D92427

device

map clause reordering reordering based on ‘present’ modifier

unclaimed

device

device-specific environment variables

unclaimed

device

omp_target_is_accessible routine

unclaimed

device

omp_get_mapped_ptr routine

done

D141545

device

new async target memory copy routines

done

D136103

device

thread_limit clause on target construct

worked on

device

has_device_addr clause on target construct

unclaimed

device

iterators in map clause or motion clauses

unclaimed

device

indirect clause on declare target directive

unclaimed

device

allow virtual functions calls for mapped object on device

unclaimed

device

interop construct

partial

parsing/sema done: D98558, D98834, D98815

device

assorted routines for querying interoperable properties

unclaimed

loop

Loop tiling transformation

done

D76342

loop

Loop unrolling transformation

done

D99459

loop

‘reproducible’/’unconstrained’ modifiers in ‘order’ clause

partial

D127855

memory management

alignment for allocate directive and clause

worked on

memory management

new memory management routines

unclaimed

memory management

changes to omp_alloctrait_key enum

unclaimed

memory model

seq_cst clause on flush construct

unclaimed

misc

‘omp_all_memory’ keyword and use in ‘depend’ clause

done

D125828, D126321

misc

error directive

unclaimed

misc

scope construct

unclaimed

misc

routines for controlling and querying team regions

unclaimed

misc

changes to ompt_scope_endpoint_t enum

unclaimed

misc

omp_display_env routine

unclaimed

misc

extended OMP_PLACES syntax

unclaimed

misc

OMP_NUM_TEAMS and OMP_TEAMS_THREAD_LIMIT env vars

done

D138769

misc

‘target_device’ selector in context specifier

unclaimed

misc

begin/end declare variant

done

D71179

misc

dispatch construct and function variant argument adjustment

worked on

D99537, D99679

misc

assume and assumes directives

worked on

misc

nothing directive

worked on

misc

masked construct and related combined constructs

worked on

D99995, D100514

misc

default(firstprivate) & default(private)

partial

firstprivate done: D75591

other

deprecating master construct

unclaimed

OMPT

new barrier types added to ompt_sync_region_t enum

unclaimed

OMPT

async data transfers added to ompt_target_data_op_t enum

unclaimed

OMPT

new barrier state values added to ompt_state_t enum

unclaimed

OMPT

new ‘emi’ callbacks for external monitoring interfaces

unclaimed

task

‘strict’ modifier for taskloop construct

unclaimed

task

inoutset in depend clause

unclaimed

task

nowait clause on taskwait

worked on

OpenMP Extensions

The following table provides a quick overview over various OpenMP extensions and their implementation status. These extensions are not currently defined by any standard, so links to associated LLVM documentation are provided. As these extensions mature, they will be considered for standardization. Please post on the Discourse forums (Runtimes - OpenMP category) to provide feedback.

Category

Feature

Status

Reviews

atomic extension

‘atomic’ strictly nested within ‘teams’

prototyped

D126323

device extension

‘ompx_hold’ map type modifier

prototyped

D106509, D106510