OpenMP Support

Standalone directives ¶

#pragma omp [for] simd: Complete.
#pragma omp declare simd: Partial. We support parsing/semantic analysis + generation of special attributes for X86 target, but still missing the LLVM pass for vectorization.
#pragma omp taskloop [simd]: Complete.
#pragma omp target [enter|exit] data: Complete.
#pragma omp target update: Complete.
#pragma omp target: Complete.
#pragma omp declare target: Complete.
#pragma omp teams: Complete.
#pragma omp distribute [simd]: Complete.
#pragma omp distribute parallel for [simd]: Complete.

Combined directives ¶

#pragma omp parallel for simd: Complete.
#pragma omp target parallel: Complete.
#pragma omp target parallel for [simd]: Complete.
#pragma omp target simd: Complete.
#pragma omp target teams: Complete.
#pragma omp teams distribute [simd]: Complete.
#pragma omp target teams distribute [simd]: Complete.
#pragma omp teams distribute parallel for [simd]: Complete.
#pragma omp target teams distribute parallel for [simd]: Complete.

Clang does not support any constructs/updates from upcoming OpenMP 5.0 except for reduction-based clauses in the task and target-based directives.

In addition, the LLVM OpenMP runtime libomp supports the OpenMP Tools Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux, Windows, and mac OS. ows, and mac OS.

Clang code generation for target regions supports two modes: the SPMD and non-SPMD modes. Clang chooses one of these two modes automatically based on the way directives and clauses on those directives are used. The SPMD mode uses a simplified set of runtime functions thus increasing performance at the cost of supporting some OpenMP features. The non-SPMD mode is the most generic mode and supports all currently available OpenMP features. The compiler will always attempt to use the SPMD mode wherever possible. SPMD mode will not be used if:

The target region contains an if() clause that refers to a parallel directive.

The target region contains a parallel directive with a num_threads() clause.

The target region contains user code (other than OpenMP-specific directives) in between the target and the parallel directives.

Features not supported or with limited support for Cuda devices ¶

Reductions across the teams are not supported yet.
Cancellation constructs are not supported.
Doacross loop nest is not supported.
User-defined reductions are supported only for trivial types.
Nested parallelism: inner parallel regions are executed sequentially.
Static linking of libraries containing device code is not supported yet.
Automatic translation of math functions in target regions to device-specific math functions is not implemented yet.
Debug information for OpenMP target regions is not supported yet.

Clang 7 documentation

OpenMP Support ¶

Standalone directives ¶

Combined directives ¶

Cuda devices support ¶

Directives execution modes ¶

Features not supported or with limited support for Cuda devices ¶