Clang Offload Wrapper

Introduction

This tool is used in OpenMP offloading toolchain to embed device code objects (usually ELF) into a wrapper host llvm IR (bitcode) file. The wrapper host IR is then assembled and linked with host code objects to generate the executable binary. See Image Binary Embedding and Execution for OpenMP for more details.

Usage

This tool can be used as follows:

$ clang-offload-wrapper -help
OVERVIEW: A tool to create a wrapper bitcode for offload target binaries.
Takes offload target binaries as input and produces bitcode file containing
target binaries packaged as data and initialization code which registers
target binaries in offload runtime.
USAGE: clang-offload-wrapper [options] <input files>
OPTIONS:
Generic Options:
  --help                             - Display available options (--help-hidden for more)
  --help-list                        - Display list of available options (--help-list-hidden for more)
  --version                          - Display the version of this program
clang-offload-wrapper options:
  -o <filename>                      - Output filename
  --target=<triple>                  - Target triple for the output module

Example

clang-offload-wrapper -target host-triple -o host-wrapper.bc gfx90a-binary.out

OpenMP Device Binary Embedding

Various structures and functions used in the wrapper host IR form the interface between the executable binary and the OpenMP runtime.

Enum Types

Offloading Declare Target Flags Enum lists different flag for offloading entries.

Offloading Declare Target Flags Enum

Name

Value

Description

OMP_DECLARE_TARGET_LINK

0x01

Mark the entry as having a ‘link’ attribute (w.r.t. link clause)

OMP_DECLARE_TARGET_CTOR

0x02

Mark the entry as being a global constructor

OMP_DECLARE_TARGET_DTOR

0x04

Mark the entry as being a global destructor

Structure Types

__tgt_offload_entry structure, __tgt_device_image structure, and __tgt_bin_desc structure are the structures used in the wrapper host IR.

__tgt_offload_entry structure

Type

Identifier

Description

void*

addr

Address of global symbol within device image (function or global)

char*

name

Name of the symbol

size_t

size

Size of the entry info (0 if it is a function)

int32_t

flags

Flags associated with the entry (see Offloading Declare Target Flags Enum)

int32_t

reserved

Reserved, to be used by the runtime library.

__tgt_device_image structure

Type

Identifier

Description

void*

ImageStart

Pointer to the target code start

void*

ImageEnd

Pointer to the target code end

__tgt_offload_entry*

EntriesBegin

Begin of table with all target entries

__tgt_offload_entry*

EntriesEnd

End of table (non inclusive)

__tgt_bin_desc structure

Type

Identifier

Description

int32_t

NumDeviceImages

Number of device types supported

__tgt_device_image*

DeviceImages

Array of device images (1 per dev. type)

__tgt_offload_entry*

HostEntriesBegin

Begin of table with all host entries

__tgt_offload_entry*

HostEntriesEnd

End of table (non inclusive)

Global Variables

Global Variables lists various global variables, along with their type and their explicit ELF sections, which are used to store device images and related symbols.

Global Variables

Variable

Type

ELF Section

Description

__start_omp_offloading_entries

__tgt_offload_entry

.omp_offloading_entries

Begin symbol for the offload entries table.

__stop_omp_offloading_entries

__tgt_offload_entry

.omp_offloading_entries

End symbol for the offload entries table.

__dummy.omp_offloading.entry

__tgt_offload_entry

.omp_offloading_entries

Dummy zero-sized object in the offload entries section to force linker to define begin/end symbols defined above.

.omp_offloading.device_image

__tgt_device_image

.omp_offloading_entries

ELF device code object of the first image.

.omp_offloading.device_image.N

__tgt_device_image

.omp_offloading_entries

ELF device code object of the (N+1)th image.

.omp_offloading.device_images

__tgt_device_image

.omp_offloading_entries

Array of images.

.omp_offloading.descriptor

__tgt_bin_desc

.omp_offloading_entries

Binary descriptor object (see details below).

Binary Descriptor for Device Images

This object is passed to the offloading runtime at program startup and it describes all device images available in the executable or shared library. It is defined as follows:

__attribute__((visibility("hidden")))
extern __tgt_offload_entry *__start_omp_offloading_entries;
__attribute__((visibility("hidden")))
extern __tgt_offload_entry *__stop_omp_offloading_entries;
static const char Image0[] = { <Bufs.front() contents> };
...
static const char ImageN[] = { <Bufs.back() contents> };
static const __tgt_device_image Images[] = {
  {
    Image0,                            /*ImageStart*/
    Image0 + sizeof(Image0),           /*ImageEnd*/
    __start_omp_offloading_entries,    /*EntriesBegin*/
    __stop_omp_offloading_entries      /*EntriesEnd*/
  },
  ...
  {
    ImageN,                            /*ImageStart*/
    ImageN + sizeof(ImageN),           /*ImageEnd*/
    __start_omp_offloading_entries,    /*EntriesBegin*/
    __stop_omp_offloading_entries      /*EntriesEnd*/
  }
};
static const __tgt_bin_desc BinDesc = {
  sizeof(Images) / sizeof(Images[0]),  /*NumDeviceImages*/
  Images,                              /*DeviceImages*/
  __start_omp_offloading_entries,      /*HostEntriesBegin*/
  __stop_omp_offloading_entries        /*HostEntriesEnd*/
};

Global Constructor and Destructor

Global constructor (.omp_offloading.descriptor_reg()) registers the library of images with the runtime by calling __tgt_register_lib() function. The cunstructor is explicitly defined in .text.startup section. Similarly, global destructor (.omp_offloading.descriptor_unreg()) calls __tgt_unregister_lib() for the unregistration and is also defined in .text.startup section.

Image Binary Embedding and Execution for OpenMP

For each offloading target, device ELF code objects are generated by clang, opt, llc, and lld pipeline. These code objects are passed to the clang-offload-wrapper.

  • At compile time, the clang-offload-wrapper tool takes the following actions:

  • At execution time:

    • The global constructor gets run and it registers the device image.