Implementation plans for -fbounds-safety

Gradual updates with experimental flag

The feature will be implemented as a series of smaller PRs and we will guard our implementation with an experimental flag -fexperimental-bounds-safety until the usable model is fully available. Once the model is ready for use, we will expose the flag -fbounds-safety.

Possible patch sets

  • External bounds annotations and the (late) parsing logic.

  • Internal bounds annotations (wide pointers) and their parsing logic.

  • Clang code generation for wide pointers with debug information.

  • Pointer cast semantics involving bounds annotations (this could be divided into multiple sub-PRs).

  • CFG analysis for pairs of related pointer and count assignments and the likes.

  • Bounds check expressions in AST and the Clang code generation (this could also be divided into multiple sub-PRs).

Proposed implementation

External bounds annotations

The bounds annotations are C type attributes appertaining to pointer types. If an attribute is added to the position of a declaration attribute, e.g., int *ptr __counted_by(size), the attribute appertains to the outermost pointer type of the declaration (int *).

New sugar types

An external bounds annotation creates a type sugar of the underlying pointer types. We will introduce a new sugar type, DynamicBoundsPointerType to represent __counted_by or __sized_by. Using AttributedType would not be sufficient because the type needs to hold the count or size expression as well as some metadata necessary for analysis, while this type may be implemented through inheritance from AttributedType. Treating the annotations as type sugars means two types with incompatible external bounds annotations may be considered canonically the same types. This is sometimes necessary, for example, to make the __counted_by and friends not participate in function overloading. However, this design requires a separate logic to walk through the entire type hierarchy to check type compatibility of bounds annotations.

Late parsing for C

A bounds annotation such as __counted_by(count) can be added to type of a struct field declaration where count is another field of the same struct declared later. Similarly, the annotation may apply to type of a function parameter declaration which precedes the parameter count in the same function. This means parsing the argument of bounds annotations must be done after the parser has the whole context of a struct or a function declaration. Clang has late parsing logic for C++ declaration attributes that require late parsing, while the C declaration attributes and C/C++ type attributes do not have the same logic. This requires introducing late parsing logic for C/C++ type attributes.

Internal bounds annotations

__indexable and __bidi_indexable alter pointer representations to be equivalent to a struct with the pointer and the corresponding bounds fields. Despite this difference in their representations, they are still pointers in terms of types of operations that are allowed and their semantics. For instance, a pointer dereference on a __bidi_indexable pointer will return the dereferenced value same as plain C pointers, modulo the extra bounds checks being performed before dereferencing the wide pointer. This means mapping the wide pointers to struct types with equivalent layout won’t be sufficient. To represent the wide pointers in Clang AST, we add an extra field in the PointerType class to indicate the internal bounds of the pointer. This ensures pointers of different representations are mapped to different canonical types while they are still treated as pointers.

In LLVM IR, wide pointers will be emitted as structs of equivalent representations. Clang CodeGen will handle them as Aggregate in TypeEvaluationKind (TEK). AggExprEmitter was extended to handle pointer operations returning wide pointers. Alternatively, a new TEK and an expression emitter dedicated to wide pointers could be introduced.

Default bounds annotations

The model may implicitly add __bidi_indexable or __single depending on the context of the declaration that has the pointer type. __bidi_indexable implicitly adds to local variables, while __single implicitly adds to pointer types specifying struct fields, function parameters, or global variables. This means the parser may first create the pointer type without any default pointer attribute and then recreate the type once the parser has the declaration context and determined the default attribute accordingly.

This also requires the parser to reset the type of the declaration with the newly created type with the right default attribute.

Promotion expression

A new expression will be introduced to represent the conversion from a pointer with an external bounds annotation, such as __counted_by, to __bidi_indexable. This type of conversion cannot be handled by normal CastExprs because it requires an extra subexpression(s) to provide the bounds information necessary to create a wide pointer.

Bounds check expression

Bounds checks are part of semantics defined in the -fbounds-safety language model. Hence, exposing the bounds checks and other semantic actions in the AST is desirable. A new expression for bounds checks has been added to the AST. The bounds check expression has a BoundsCheckKind to indicate the kind of checks and has the additional sub-expressions that are necessary to perform the check according to the kind.

Paired assignment check

-fbounds-safety enforces that variables or fields related with the same external bounds annotation (e.g., buf and count related with __counted_by in the example below) must be updated side by side within the same basic block and without side effect in between.

typedef struct {
   int *__counted_by(count) buf; size_t count;
} sized_buf_t;

void alloc_buf(sized_buf_t *sbuf, sized_t nelems) {
   sbuf->buf = (int *)malloc(sizeof(int) * nelems);
   sbuf->count = nelems;
}

To implement this rule, the compiler requires a linear representation of statements to understand the ordering and the adjacency between the two or more assignments. The Clang CFG is used to implement this analysis as Clang CFG provides a linear view of statements within each CFGBlock (Clang CFGBlock represents a single basic block in a source-level CFG).

Bounds check optimizations

In -fbounds-safety, the Clang frontend emits run-time checks for every memory dereference if the type system or analyses in the frontend couldn’t verify its bounds safety. The implementation relies on LLVM optimizations to remove redundant run-time checks. Using this optimization strategy, if the original source code already has bounds checks, the fewer additional checks -fbounds-safety will introduce. The LLVM ConstraintElimination pass is design to remove provable redundant checks (please check Florian Hahn’s presentation in 2021 LLVM Dev Meeting and the implementation to learn more). In the following example, -fbounds-safety implicitly adds the redundant bounds checks that the optimizer can remove:

void fill_array_with_indices(int *__counted_by(count) p, size_t count) {
   for (size_t i = 0; i < count; ++i) {
      // implicit bounds checks:
      //   if (p + i < p || p + i + 1 > p + count) trap();
      p[i] = i;
   }
}

ConstraintElimination collects the following facts and determines if the bounds checks can be safely removed:

  • Inside the for-loop, 0 <= i < count, hence 1 <= i + 1 <= count.

  • Pointer arithmetic p + count in the if-condition doesn’t wrap.

  • -fbounds-safety treats pointer arithmetic overflow as deterministically two’s complement computation, not an undefined behavior. Therefore, getelementptr does not typically have inbounds keyword. However, the compiler does emit inbounds for p + count in this case because __counted_by(count) has the invariant that p has at least as many as elements as count. Using this information, ConstraintElimination is able to determine p + count doesn’t wrap.

  • Accordingly, p + i and p + i + 1 also don’t wrap.

  • Therefore, p <= p + i and p + i + 1 <= p + count.

  • The if-condition simplifies to false and becomes dead code that the subsequent optimization passes can remove.

OptRemarks can be utilized to provide insights into performance tuning. It has the capability to report on checks that it cannot eliminate, possibly with reasons, allowing programmers to adjust their code to unlock further optimizations.

Debugging

Internal bounds annotations

Internal bounds annotations change a pointer into a wide pointer. The debugger needs to understand that wide pointers are essentially pointers with a struct layout. To handle this, a wide pointer is described as a record type in the debug info. The type name has a special name prefix (e.g., __bounds_safety$bidi_indexable) which can be recognized by a debug info consumer to provide support that goes beyond showing the internal structure of the wide pointer. There are no DWARF extensions needed to support wide pointers. In our implementation, LLDB recognizes wide pointer types by name and reconstructs them as wide pointer Clang AST types for use in the expression evaluator.

External bounds annotations

Similar to internal bounds annotations, external bound annotations are described as a typedef to their underlying pointer type in the debug info, and the bounds are encoded as strings in the typedef’s name (e.g., __bounds_safety$counted_by:N).

Recognizing -fbounds-safety traps

Clang emits debug info for -fbounds-safety traps as inlined functions, where the function name encodes the error message. LLDB implements a frame recognizer to surface a human-readable error cause to the end user. A debug info consumer that is unaware of this sees an inlined function whose name encodes an error message (e.g., : __bounds_safety$Bounds check failed).

Expression Parsing

In our implementation, LLDB’s expression evaluator does not enable the -fbounds-safety language option because it’s currently unable to fully reconstruct the pointers with external bounds annotations, and also because the evaluator operates in C++ mode, utilizing C++ reference types, while -fbounds-safety does not currently support C++. This means LLDB’s expression evaluator can only evaluate a subset of the -fbounds-safety language model. Specifically, it’s capable of evaluating the wide pointers that already exist in the source code. All other expressions are evaluated according to C/C++ semantics.

C++ support

C++ has multiple options to write code in a bounds-safe manner, such as following the bounds-safety core guidelines and/or using hardened libc++ along with the C++ Safe Buffer model. However, these techniques may require ABI changes and may not be applicable to code interoperating with C. When the ABI of an existing program needs to be preserved and for headers shared between C and C++, -fbounds-safety offers a potential solution.

-fbounds-safety is not currently supported in C++, but we believe the general approach would be applicable for future efforts.