Hardware-assisted AddressSanitizer Design Documentation

This page is a design document for hardware-assisted AddressSanitizer (or HWASAN) a tool similar to AddressSanitizer, but based on partial hardware assistance.

Introduction

AddressSanitizer tags every 8 bytes of the application memory with a 1 byte tag (using shadow memory), uses redzones to find buffer-overflows and quarantine to find use-after-free. The redzones, the quarantine, and, to a less extent, the shadow, are the sources of AddressSanitizer’s memory overhead. See the AddressSanitizer paper for details.

AArch64 has the Address Tagging (or top-byte-ignore, TBI), a hardware feature that allows software to use 8 most significant bits of a 64-bit pointer as a tag. HWASAN uses Address Tagging to implement a memory safety tool, similar to AddressSanitizer, but with smaller memory overhead and slightly different (mostly better) accuracy guarantees.

Algorithm

  • Every heap/stack/global memory object is forcibly aligned by TG bytes (TG is e.g. 16 or 64). We call TG the tagging granularity.

  • For every such object a random TS-bit tag T is chosen (TS, or tag size, is e.g. 4 or 8)

  • The pointer to the object is tagged with T.

  • The memory for the object is also tagged with T (using a TG=>1 shadow memory)

  • Every load and store is instrumented to read the memory tag and compare it with the pointer tag, exception is raised on tag mismatch.

For a more detailed discussion of this approach see https://arxiv.org/pdf/1802.09517.pdf

Short granules

A short granule is a granule of size between 1 and TG-1 bytes. The size of a short granule is stored at the location in shadow memory where the granule’s tag is normally stored, while the granule’s actual tag is stored in the last byte of the granule. This means that in order to verify that a pointer tag matches a memory tag, HWASAN must check for two possibilities:

  • the pointer tag is equal to the memory tag in shadow memory, or

  • the shadow memory tag is actually a short granule size, the value being loaded is in bounds of the granule and the pointer tag is equal to the last byte of the granule.

Pointer tags between 1 to TG-1 are possible and are as likely as any other tag. This means that these tags in memory have two interpretations: the full tag interpretation (where the pointer tag is between 1 and TG-1 and the last byte of the granule is ordinary data) and the short tag interpretation (where the pointer tag is stored in the granule).

When HWASAN detects an error near a memory tag between 1 and TG-1, it will show both the memory tag and the last byte of the granule. Currently, it is up to the user to disambiguate the two possibilities.

Instrumentation

Memory Accesses

All memory accesses are prefixed with an inline instruction sequence that verifies the tags. Currently, the following sequence is used:

// int foo(int *a) { return *a; }
// clang -O2 --target=aarch64-linux -fsanitize=hwaddress -fsanitize-recover=hwaddress -c load.c
foo:
     0:       90000008        adrp    x8, 0 <__hwasan_shadow>
     4:       f9400108        ldr     x8, [x8]         // shadow base (to be resolved by the loader)
     8:       d344dc09        ubfx    x9, x0, #4, #52  // shadow offset
     c:       38696909        ldrb    w9, [x8, x9]     // load shadow tag
    10:       d378fc08        lsr     x8, x0, #56      // extract address tag
    14:       6b09011f        cmp     w8, w9           // compare tags
    18:       54000061        b.ne    24 <foo+0x24>    // jump to short tag handler on mismatch
    1c:       b9400000        ldr     w0, [x0]         // original load
    20:       d65f03c0        ret
    24:       7100413f        cmp     w9, #0x10        // is this a short tag?
    28:       54000142        b.cs    50 <foo+0x50>    // if not, trap
    2c:       12000c0a        and     w10, w0, #0xf    // find the address's position in the short granule
    30:       11000d4a        add     w10, w10, #0x3   // adjust to the position of the last byte loaded
    34:       6b09015f        cmp     w10, w9          // check that position is in bounds
    38:       540000c2        b.cs    50 <foo+0x50>    // if not, trap
    3c:       9240dc09        and     x9, x0, #0xffffffffffffff
    40:       b2400d29        orr     x9, x9, #0xf     // compute address of last byte of granule
    44:       39400129        ldrb    w9, [x9]         // load tag from it
    48:       6b09011f        cmp     w8, w9           // compare with pointer tag
    4c:       54fffe80        b.eq    1c <foo+0x1c>    // if so, continue
    50:       d4212440        brk     #0x922           // otherwise trap
    54:       b9400000        ldr     w0, [x0]         // tail duplicated original load (to handle recovery)
    58:       d65f03c0        ret

Alternatively, memory accesses are prefixed with a function call. On AArch64, a function call is used by default in trapping mode. The code size and performance overhead of the call is reduced by using a custom calling convention that preserves most registers and is specialized to the register containing the address and the type and size of the memory access.

Heap

Tagging the heap memory/pointers is done by malloc. This can be based on any malloc that forces all objects to be TG-aligned. free tags the memory with a different tag.

Stack

Stack frames are instrumented by aligning all non-promotable allocas by TG and tagging stack memory in function prologue and epilogue.

Tags for different allocas in one function are not generated independently; doing that in a function with M allocas would require maintaining M live stack pointers, significantly increasing register pressure. Instead we generate a single base tag value in the prologue, and build the tag for alloca number M as ReTag(BaseTag, M), where ReTag can be as simple as exclusive-or with constant M.

Stack instrumentation is expected to be a major source of overhead, but could be optional.

Globals

TODO: details.

Error reporting

Errors are generated by the HLT instruction and are handled by a signal handler.

Attribute

HWASAN uses its own LLVM IR Attribute sanitize_hwaddress and a matching C function attribute. An alternative would be to re-use ASAN’s attribute sanitize_address. The reasons to use a separate attribute are:

  • Users may need to disable ASAN but not HWASAN, or vise versa, because the tools have different trade-offs and compatibility issues.

  • LLVM (ideally) does not use flags to decide which pass is being used, ASAN or HWASAN are being applied, based on the function attributes.

This does mean that users of HWASAN may need to add the new attribute to the code that already uses the old attribute.

Comparison with AddressSanitizer

HWASAN:
  • Is less portable than AddressSanitizer as it relies on hardware Address Tagging (AArch64). Address Tagging can be emulated with compiler instrumentation, but it will require the instrumentation to remove the tags before any load or store, which is infeasible in any realistic environment that contains non-instrumented code.

  • May have compatibility problems if the target code uses higher pointer bits for other purposes.

  • May require changes in the OS kernels (e.g. Linux seems to dislike tagged pointers passed from address space: https://www.kernel.org/doc/Documentation/arm64/tagged-pointers.txt).

  • Does not require redzones to detect buffer overflows, but the buffer overflow detection is probabilistic, with roughly 1/(2**TS) chance of missing a bug (6.25% or 0.39% with 4 and 8-bit TS respectively).

  • Does not require quarantine to detect heap-use-after-free, or stack-use-after-return. The detection is similarly probabilistic.

The memory overhead of HWASAN is expected to be much smaller than that of AddressSanitizer: 1/TG extra memory for the shadow and some overhead due to TG-aligning all objects.

Supported architectures

HWASAN relies on Address Tagging which is only available on AArch64. For other 64-bit architectures it is possible to remove the address tags before every load and store by compiler instrumentation, but this variant will have limited deployability since not all of the code is typically instrumented.

The HWASAN’s approach is not applicable to 32-bit architectures.