LLVM 1.5 Release Notes

This document contains the release notes for the LLVM compiler infrastructure, release 1.5. Here we describe the status of LLVM, including any known problems and major improvements from the previous release. The most up-to-date version of this document can be found on the LLVM 1.5 web site. If you are not reading this on the LLVM web pages, you should probably go there because this document may be updated after the release.

For more information about LLVM, including information about the latest release, please check out the main LLVM web site. If you have questions or comments, the LLVM developer's mailing list is a good place to send them.

Note that if you are reading this file from CVS or the main LLVM web page, this document applies to the next release, not the current one. To see the release notes for the current or previous releases, see the releases page.

This is the sixth public release of the LLVM Compiler Infrastructure.

LLVM 1.5 is known to correctly compile a wide range of C and C++ programs, includes bug fixes for those problems found since the 1.4 release, and includes a large number of new features and enhancements, described below.

This release includes new native code generators for Alpha, IA-64, and SPARC-V8 (32-bit SPARC). These code generators are still beta quality, but are progressing rapidly. The Alpha backend is implemented with an eye towards being compatible with the widely used SimpleScalar simulator.

This release includes a new framework for building instruction selectors, which has long been the hardest part of building a new LLVM target. This framework handles a lot of the mundane (but easy to get wrong) details of writing the instruction selector, such as generating efficient code for getelementptr instructions, promoting small integer types to larger types (e.g. for RISC targets with one size of integer registers), expanding 64-bit integer operations for 32-bit targets, etc. Currently, the X86, PowerPC, Alpha, and IA-64 backends use this framework. The SPARC backends will be migrated when time permits.

LLVM 1.5 adds supports for per-function calling conventions. Traditionally, the LLVM code generators match the native C calling conventions for a target. This is important for compatibility, but is not very flexible. This release allows custom calling conventions to be established for functions, and defines three target-independent conventions (C call, fast call, and cold call) which may be supported by code generators. When possible, the LLVM optimizer promotes C functions to use the "fastcc" convention, allowing the use of more efficient calling sequences (e.g., parameters are passed in registers in the X86 target).

Targets may now also define target-specific calling conventions, allowing LLVM to fully support calling convention altering options (e.g. GCC's -mregparm flag) and well-defined target conventions (e.g. stdcall and fastcall on X86).

The release now includes support for proper tail calls, as required to implement languages like Scheme. Tail calls make use of two features: custom calling conventions (described above), which allow the code generator to use a convention where the caller deallocates its stack before it returns. The second feature is a flag on the call instruction, which indicates that the callee does not access the caller's stack frame (indicating that it is acceptable to deallocate the caller stack before invoking the callee). LLVM proper tail calls run on the system stack (as do normal calls), supports indirect tail calls, tail calls with arbitrary numbers of arguments, tail calls where the callee requires more argument space than the caller, etc. The only case not supported are varargs calls, but that could be added if desired.

To ensure a call is interpreted as a tail call, a front-end must mark functions as "fastcc", mark calls with the 'tail' marker, and follow the call with a return of the called value (or void). The optimizer and code generator attempt to handle more general cases, but the simple case will always work if the code generator supports tail calls. Here is an example:

    fastcc int %bar(int %X, int(double, int)* %FP) {       ; fastcc
        %Y = tail call fastcc int %FP(double 0.0, int %X)  ; tail, fastcc
        ret int %Y
    }

In LLVM 1.5, the X86 code generator is the only target that has been enhanced to support proper tail calls (other targets will be enhanced in future). Further, because this support was added very close to the release, it is disabled by default. Pass -enable-x86-fastcc to llc to enable it (this will be enabled by default in the next release). The example above compiles to:

    bar:
        sub ESP, 8                   # Callee uses more space than the caller
        mov ECX, DWORD PTR [ESP + 8] # Get the old return address
        mov DWORD PTR [ESP + 4], 0   # First half of 0.0
        mov DWORD PTR [ESP + 8], 0   # Second half of 0.0
        mov DWORD PTR [ESP], ECX     # Put the return address where it belongs
        jmp EDX                      # Tail call "FP"

With fastcc on X86, the first two integer arguments are passed in EAX/EDX, the callee pops its arguments off the stack, and the argument area is always a multiple of 8 bytes in size.

LLVM now includes an Interprocedural Sparse Conditional Constant Propagation pass, named -ipsccp, which is run by default at link-time.
LLVM 1.5 is now about 15% faster than LLVM 1.4 and its core data structures use about 30% less memory.
Support for Microsoft Visual Studio is improved, and now documented. Most LLVM tools build natively with Visual C++ now.
Configuring LLVM to build a subset of the available targets is now implemented, via the --enable-targets= option.
LLVM can now create native shared libraries with 'llvm-gcc ... -shared -Wl,-native' (or with -Wl,-native-cbe).
LLVM now supports a new "llvm.prefetch " intrinsic, and llvm-gcc now supports __builtin_prefetch.
LLVM now supports intrinsics for bit counting and llvm-gcc now implements the GCC __builtin_popcount, __builtin_ctz, and __builtin_clz builtins.
LLVM now mostly builds on HP-UX with the HP aCC Compiler.
The LLVM X86 backend can now emit Cygwin-compatible .s files.
LLVM now includes workarounds in the code generator generator which reduces the likelyhood of GCC hitting swap during optimized builds.
The LLVM Transformation Visualizer (llvm-tv) project has been updated to work with LLVM 1.5.
Nightly tester output is now archived on the llvm-testresults mailing list.

The new -simplify-libcalls pass improves code generated for well-known library calls. The pass optimizes calls to many of the string, memory, and standard I/O functions (e.g. replace the calls with simpler/faster calls) when possible, given information known statically about the arguments to the call.
The -globalopt pass now promotes non-address-taken static globals that are only accessed in main to SSA registers.
Loops with trip counts based on array pointer comparisons (e.g. "for (i = 0; &A[i] != &A[n]; ++i) ...") are optimized better than before, which primarily helps iterator-intensive C++ code.
The optimizer now eliminates simple cases where redundant conditions exist between neighboring blocks.
The reassociation pass (which turns (1+X+3) into (X+1+3) among other things), is more aggressive and intelligent.
The -prune-eh pass now detects no-return functions in addition to the no-unwind functions it did before.
The -globalsmodref alias analysis generates more precise results in some cases.

The code generator now can provide and use information about commutative two-address instructions when performing register allocation.
The code generator now tracks function live-in registers explicitly, instead of requiring the target to generate 'implicit defs' at the entry to a function.
The code generator can lower integer division by a constant to multiplication by a magic constant and multiplication by a constant into shift/add sequences.
The code generator compiles fabs/fneg/sin/cos/sqrt to assembly instructions when possible.
The PowerPC backend generates better code in many cases, making use of FMA instructions and the recording ("dot") forms of various PowerPC instructions.

Bugs fixed in the LLVM Core:

[dse] DSE deletes stores that are partially overwritten by smaller stores
[instcombine] miscompilation of setcc or setcc in one case
Transition code for LLVM 1.0 style varargs was removed from the .ll file parser. LLVM 1.0 bytecode files are still supported.

Code Generator Bugs:

[cbackend] Logical constant expressions (and/or/xor) not implemented.
[cbackend] C backend does not respect 'volatile'.
The JIT sometimes miscompiled globals and constant pool entries for 64-bit integer constants on 32-bit hosts.
The C backend should no longer produce code that crashes ICC 8.1.

Bugs in the C/C++ front-end:

LLVM is known to work on the following platforms:

Intel and AMD machines running Red Hat Linux, Fedora Core and FreeBSD (and probably other unix-like systems).
Sun UltraSPARC workstations running Solaris 8.
Intel and AMD machines running on Win32 with the Cygwin libraries (limited support is available for native builds with Visual C++).
PowerPC-based Mac OS X systems, running 10.2 and above.
Alpha-based machines running Debian GNU/Linux.
Itanium-based machines running Linux and HP-UX.

The core LLVM infrastructure uses GNU autoconf to adapt itself to the machine and operating system on which it is built. However, minor porting may be required to get LLVM to work on new platforms. We welcome your portability patches and reports of successful builds or error messages.

This section contains all known problems with the LLVM system, listed by component. As new problems are discovered, they will be added to these sections. If you run into a problem, please check the LLVM bug database and submit a bug if there isn't already one.

The following components of this LLVM release are either untested, known to be broken or unreliable, or are in early development. These components should not be relied on, and bugs should not be filed against them, but they may be useful to some people. In particular, if you would like to work on one of these components, please contact us on the llvmdev list.

The following passes are incomplete or buggy, and may be removed in future releases: -cee, -branch-combine, -instloops, -paths, -pre
The llvm-db tool is in a very early stage of development, but can be used to step through programs and inspect the stack.
The "iterative scan" register allocator (enabled with -regalloc=iterativescan) is not stable.
The SparcV8, Alpha, and IA64 code generators are experimental.

In the JIT, dlsym() on a symbol compiled by the JIT will not work.
The JIT does not use mutexes to protect its internal data structures. As such, execution of a threaded program could cause these data structures to be corrupted.
The lower-invoke pass does not mark values live across a setjmp as volatile. This missing feature only affects targets whose setjmp/longjmp libraries do not save and restore the entire register file.
The simplify-libcalls pass generates ill-formed LLVM code.

C99 Variable sized arrays do not release stack memory when they go out of scope. Thus, the following program may run out of stack space:
```
    for (i = 0; i != 1000000; ++i) {
      int X[n];
      foo(X);
    }
```
Initialization of global union variables can only be done with the largest union member.

Inline assembly is not yet supported.
"long double" is transformed by the front-end into "double". There is no support for floating point data types of any size other than 32 and 64 bits.
The following Unix system functionality has not been tested and may not work:
1. sigsetjmp, siglongjmp - These are not turned into the appropriate invoke/unwind instructions. Note that setjmp and longjmp are compiled correctly.
2. getcontext, setcontext, makecontext - These functions have not been tested.
Although many GCC extensions are supported, some are not. In particular, the following extensions are known to not be supported:
1. Local Labels: Labels local to a block.
2. Nested Functions: As in Algol and Pascal, lexical scoping of functions.
3. Constructing Calls: Dispatching a call to another function.
4. Extended Asm: Assembler instructions with C expressions as operands.
5. Constraints: Constraints for asm operands.
6. Asm Labels: Specifying the assembler name to use for a C symbol.
7. Explicit Reg Vars: Defining variables residing in specified registers.
8. Vector Extensions: Using vector instructions through built-in functions.
9. Target Builtins: Built-in functions specific to particular targets.
10. Thread-Local: Per-thread variables.
11. Pragmas: Pragmas accepted by GCC.
The following GCC extensions are partially supported. An ignored attribute means that the LLVM compiler ignores the presence of the attribute, but the code should still work. An unsupported attribute is one which is ignored by the LLVM compiler and will cause a different interpretation of the program.
1. Variable Length: Arrays whose length is computed at run time.
  Supported, but allocated stack space is not freed until the function returns (noted above).
2. Function Attributes: Declaring that functions have no side effects or that they can never return.
  Supported: format, format_arg, non_null, noreturn, constructor, destructor, unused, deprecated, warn_unused_result, weak
  Ignored: noinline, always_inline, pure, const, nothrow, malloc, no_instrument_function, cdecl
  Unsupported: used, section, alias, visibility, regparm, stdcall, fastcall, all other target specific attributes
3. Variable Attributes: Specifying attributes of variables.
  Supported: cleanup, common, nocommon, deprecated, transparent_union, unused, weak
  Unsupported: aligned, mode, packed, section, shared, tls_model, vector_size, dllimport, dllexport, all target specific attributes.
4. Type Attributes: Specifying attributes of types.
  Supported: transparent_union, unused, deprecated, may_alias
  Unsupported: aligned, packed, all target specific attributes.
5. Other Builtins: Other built-in functions.
  We support all builtins which have a C language equivalent (e.g., __builtin_cos), __builtin_alloca, __builtin_types_compatible_p, __builtin_choose_expr, __builtin_constant_p, and __builtin_expect (currently ignored). We also support builtins for ISO C99 floating point comparison macros (e.g., __builtin_islessequal), __builtin_prefetch, __builtin_popcount[ll], __builtin_clz[ll], and __builtin_ctz[ll].
The following extensions are known to be supported:
1. Labels as Values: Getting pointers to labels and computed gotos.
2. Statement Exprs: Putting statements and declarations inside expressions.
3. Typeof: typeof: referring to the type of an expression.
4. Lvalues: Using ?:, "," and casts in lvalues.
5. Conditionals: Omitting the middle operand of a ?: expression.
6. Long Long: Double-word integers.
7. Complex: Data types for complex numbers.
8. Hex Floats:Hexadecimal floating-point constants.
9. Zero Length: Zero-length arrays.
10. Empty Structures: Structures with no members.
11. Variadic Macros: Macros with a variable number of arguments.
12. Escaped Newlines: Slightly looser rules for escaped newlines.
13. Subscripting: Any array can be subscripted, even if not an lvalue.
14. Pointer Arith: Arithmetic on void-pointers and function pointers.
15. Initializers: Non-constant initializers.
16. Compound Literals: Compound literals give structures, unions, or arrays as values.
17. Designated Inits: Labeling elements of initializers.
18. Cast to Union: Casting to union type from any member of the union.
19. Case Ranges: `case 1 ... 9' and such.
20. Mixed Declarations: Mixing declarations and code.
21. Function Prototypes: Prototype declarations and old-style definitions.
22. C++ Comments: C++ comments are recognized.
23. Dollar Signs: Dollar sign is allowed in identifiers.
24. Character Escapes: \e stands for the character <ESC>.
25. Alignment: Inquiring about the alignment of a type or variable.
26. Inline: Defining inline functions (as fast as macros).
27. Alternate Keywords:__const__, __asm__, etc., for header files.
28. Incomplete Enums: enum foo;, with details to follow.
29. Function Names: Printable strings which are the name of the current function.
30. Return Address: Getting the return or frame address of a function.
31. Unnamed Fields: Unnamed struct/union fields within structs/unions.
32. Attribute Syntax: Formal syntax for attributes.

If you run into GCC extensions which have not been included in any of these lists, please let us know (also including whether or not they work).

For this release, the C++ front-end is considered to be fully tested and works for a number of non-trivial programs, including LLVM itself.

The C++ front-end inherits all problems afflicting the C front-end.

The C++ front-end is based on a pre-release of the GCC 3.4 C++ parser. This parser is significantly more standards compliant (and picky) than prior GCC versions. For more information, see the C++ section of the GCC 3.4 release notes.
Destructors for local objects are not always run when a longjmp is performed. In particular, destructors for objects in the longjmping function and in the setjmp receiver function may not be run. Objects in intervening stack frames will be destroyed, however (which is better than most compilers).
The LLVM C++ front-end follows the Itanium C++ ABI. This document, which is not Itanium specific, specifies a standard for name mangling, class layout, v-table layout, RTTI formats, and other C++ representation issues. Because we use this API, code generated by the LLVM compilers should be binary compatible with machine code generated by other Itanium ABI C++ compilers (such as G++, the Intel and HP compilers, etc). However, the exception handling mechanism used by LLVM is very different from the model used in the Itanium ABI, so exceptions will not interact correctly.

The C back-end produces code that violates the ANSI C Type-Based Alias Analysis rules. As such, special options may be necessary to compile the code (for example, GCC requires the -fno-strict-aliasing option). This problem probably cannot be fixed.
Zero arg vararg functions are not supported. This should not affect LLVM produced by the C or C++ frontends.

Memory Mapped I/O Intrinsics do not fence memory

None yet

[sparcv9] SparcV9 backend miscompiles several programs in the LLVM test suite

On 21164s, some rare FP arithmetic sequences which may trap do not have the appropriate nops inserted to ensure restartability.
Defining vararg functions is not supported (but calling them is ok).
Due to the vararg problems, C++ exceptions do not work. Small changes are required to the CFE (which break correctness in the exception handler) to compile the exception handling library (and thus the C++ standard library).

C++ programs are likely to fail on IA64, as calls to setjmp are made where the argument is not 16-byte aligned, as required on IA64. (Strictly speaking this is not a bug in the IA64 back-end; it will also be encountered when building C++ programs using the C back-end.)
The C++ front-end does not use IA64 ABI compliant layout of v-tables. In particular, it just stores function pointers instead of function descriptors in the vtable. This bug prevents mixing C++ code compiled with LLVM with C++ objects compiled by other C++ compilers.
There are a few ABI violations which will lead to problems when mixing LLVM output with code built with other compilers, particularly for floating-point programs.
Defining vararg functions is not supported (but calling them is ok).

Many features are still missing (e.g. support for 64-bit integer arithmetic).
This backend needs to be updated to use the SelectionDAG instruction selection framework.

A wide variety of additional information is available on the LLVM web page, including documentation and publications describing algorithms and components implemented in LLVM. The web page also contains versions of the API documentation which is up-to-date with the CVS version of the source code. You can access versions of these documents specific to this release by going into the "llvm/doc/" directory in the LLVM tree.

If you have any questions or comments about LLVM, please feel free to contact us via the mailing lists.