This document contains the release notes for the LLVM Compiler Infrastructure, release 3.7. Here we describe the status of LLVM, including major improvements from the previous release, improvements in various subprojects of LLVM, and some of the current users of the code. All LLVM releases may be downloaded from the LLVM releases web site.
For more information about LLVM, including information about the latest release, please check out the main LLVM web site. If you have questions or comments, the LLVM Developer’s Mailing List is a good place to send them.
Note that if you are reading this file from a Subversion checkout or the main LLVM web page, this document applies to the next release, not the current one. To see the release notes for a specific release, please see the releases page.
The minimum required Visual Studio version for building LLVM is now 2013 Update 4.
A new documentation page, Performance Tips for Frontend Authors, contains a collection of tips for frontend authors on how to generate IR which LLVM is able to effectively optimize.
The DataLayout is no longer optional. All the IR level optimizations expects it to be present and the API has been changed to use a reference instead of a pointer to make it explicit. The Module owns the datalayout and it has to match the one attached to the TargetMachine for generating code.
MyPassManager->add(new DataLayoutPass(MyTargetMachine->getDataLayout()));
MyModule->setDataLayout(MyTargetMachine->createDataLayout());
The LLVM C API LLVMGetTargetMachineData is deprecated to reflect the fact that it won’t be available anymore from TargetMachine in 3.8.
Comdats are now ortogonal to the linkage. LLVM will not create comdats for weak linkage globals and the frontends are responsible for explicitly adding them.
On ELF we now support multiple sections with the same name and comdat. This allows for smaller object files since multiple sections can have a simple name (.text, .rodata, etc).
LLVM now lazily loads metadata in some cases. Creating archives with IR files with debug info is now 25X faster.
llvm-ar can create archives in the BSD format used by OS X.
LLVM received a backend for the extended Berkely Packet Filter instruction set that can be dynamically loaded into the Linux kernel via the bpf(2) syscall.
Support for BPF has been present in the kernel for some time, but starting from 3.18 has been extended with such features as: 64-bit registers, 8 additional registers registers, conditional backwards jumps, call instruction, shift instructions, map (hash table, array, etc.), 1-8 byte load/store from stack, and more.
Up until now, users of BPF had to write bytecode by hand, or use custom generators. This release adds a proper LLVM backend target for the BPF bytecode architecture.
The BPF target is now available by default, and options exist in both Clang (-target bpf) or llc (-march=bpf) to pick eBPF as a backend.
Switch-case lowering was rewritten to avoid generating unbalanced search trees (PR22262) and to exploit profile information when available. Some lowering strategies are now disabled when optimizations are turned off, to save compile time.
The debug info IR class hierarchy now inherits from Metadata and has its own bitcode records and assembly syntax (documented in LangRef). The debug info verifier has been merged with the main verifier.
LLVM IR and APIs are in a period of transition to aid in the removal of pointer types (the end goal being that pointers are typeless/opaque - void*, if you will). Some APIs and IR constructs have been modified to take explicit types that are currently checked to match the target type of their pre-existing pointer type operands. Further changes are still needed, but the more you can avoid using PointerType::getPointeeType, the easier the migration will be.
Argument-less TargetMachine::getSubtarget and TargetMachine::getSubtargetImpl have been removed from the tree. Updating out of tree ports is as simple as implementing a non-virtual version in the target, but implementing full Function based TargetSubtargetInfo support is recommended.
This is expected to be the last major release of LLVM that supports being run on Windows XP and Windows Vista. For the next major release the minimum Windows version requirement will be Windows 7.
During this release the MIPS target has:
There are numerous improvements to the PowerPC target in this release:
Added a new C++ JIT API called On Request Compilation, or ORC.
ORC is a new JIT API inspired by MCJIT but designed to be more testable, and easier to extend with new features. A key new feature already in tree is lazy, function-at-a-time compilation for X86. Also included is a reimplementation of MCJIT’s API and behavior (OrcMCJITReplacement). MCJIT itself remains in tree, and continues to be the default JIT ExecutionEngine, though new users are encouraged to try ORC out for their projects. (A good place to start is the new ORC tutorials under llvm/examples/kaleidoscope/orc).
In addition to the core LLVM 3.7 distribution of production-quality compiler infrastructure, the LLVM project includes sub-projects that use the LLVM core and share the same distribution license. This section provides updates on these sub-projects.
Polly is a polyhedral loop optimization infrastructure that provides data-locality optimizations to LLVM-based compilers. When compiled as part of clang or loaded as a module into clang, it can perform loop optimizations such as tiling, loop fusion or outer-loop vectorization. As a generic loop optimization infrastructure it allows developers to get a per-loop-iteration model of a loop nest on which detailed analysis and transformations can be performed.
Changes since the last release:
isl imported into Polly distribution
isl, the math library Polly uses, has been imported into the source code repository of Polly and is now distributed as part of Polly. As this was the last external library dependency of Polly, Polly can now be compiled right after checking out the Polly source code without the need for any additional libraries to be pre-installed.
Small integer optimization of isl
The MIT licensed imath backend using in isl for arbitrary width integer computations has been optimized to use native integer operations for the common case where the operands of a computation fit into 32 bit and to only fall back to large arbitrary precision integers for the remaining cases. This optimization has greatly improved the compile-time performance of Polly, both due to faster native operations also due to a reduction in malloc traffic and pointer indirections. As a result, computations that use arbitrary precision integers heavily have been speed up by almost 6x. As a result, the compile-time of Polly on the Polybench test kernels in the LNT suite has been reduced by 20% on average with compile time reductions between 9-43%.
Schedule Trees
Polly now uses internally so-called > Schedule Trees < to model the loop structure it optimizes. Schedule trees are an easy to understand tree structure that describes a loop nest using integer constraint sets to keep track of execution constraints. It allows the developer to use per-tree-node operations to modify the loop tree. Programatic analysis that work on the schedule tree (e.g., as dependence analysis) also show a visible speedup as they can exploit the tree structure of the schedule and need to fall back to ILP based optimization problems less often. Section 6 of Polyhedral AST generation is more than scanning polyhedra gives a detailed explanation of this schedule trees.
Scalar and PHI node modeling - Polly as an analysis
Polly now requires almost no preprocessing to analyse LLVM-IR, which makes it easier to use Polly as a pure analysis pass e.g. to provide more precise dependence information to non-polyhedral transformation passes. Originally, Polly required the input LLVM-IR to be preprocessed such that all scalar and PHI-node dependences are translated to in-memory operations. Since this release, Polly has full support for scalar and PHI node dependences and requires no scalar-to-memory translation for such kind of dependences.
Modeling of modulo and non-affine conditions
Polly can now supports modulo operations such as A[t%2][i][j] as they appear often in stencil computations and also allows data-dependent conditional branches as they result e.g. from ternary conditions ala A[i] > 255 ? 255 : A[i].
Delinearization
Polly now support the analysis of manually linearized multi-dimensional arrays as they result form macros such as “#define 2DARRAY(A,i,j) (A.data[(i) * A.size + (j)]”. Similar constructs appear in old C code written before C99, C++ code such as boost::ublas, LLVM exported from Julia, Matlab generated code and many others. Our work titled Optimistic Delinearization of Parametrically Sized Arrays gives details.
Compile time improvements
Pratik Bahtu worked on compile-time performance tuning of Polly. His work together with the support for schedule trees and the small integer optimization in isl notably reduced the compile time.
Increased compute timeouts
As Polly’s compile time has been notabily improved, we were able to increase the compile time saveguards in Polly. As a result, the default configuration of Polly can now analyze larger loop nests without running into compile time restrictions.
Export Debug Locations via JSCoP file
Polly’s JSCoP import/export format gained support for debug locations that show to the user the source code location of detected scops.
Improved windows support
The compilation of Polly on windows using cmake has been improved and several visual studio build issues have been addressed.
Many bug fixes
An exciting aspect of LLVM is that it is used as an enabling technology for a lot of other language and tools projects. This section lists some of the projects that have already been updated to work with LLVM 3.7.
D is a language with C-like syntax and static typing. It pragmatically combines efficiency, control, and modeling power, with safety and programmer productivity. D supports powerful concepts like Compile-Time Function Execution (CTFE) and Template Meta-Programming, provides an innovative approach to concurrency and offers many classical paradigms.
LDC uses the frontend from the reference compiler combined with LLVM as backend to produce efficient native code. LDC targets x86/x86_64 systems like Linux, OS X, FreeBSD and Windows and also Linux on PowerPC (32/64 bit). Ports to other architectures like ARM, AArch64 and MIPS64 are underway.
In addition to producing an easily portable open source OpenCL implementation, another major goal of pocl is improving performance portability of OpenCL programs with compiler optimizations, reducing the need for target-dependent manual optimizations. An important part of pocl is a set of LLVM passes used to statically parallelize multiple work-items with the kernel compiler, even in the presence of work-group barriers.
TCE is a toolset for designing customized exposed datapath processors based on the Transport triggered architecture (TTA).
The toolset provides a complete co-design flow from C/C++ programs down to synthesizable VHDL/Verilog and parallel program binaries. Processor customization points include the register files, function units, supported operations, and the interconnection network.
TCE uses Clang and LLVM for C/C++/OpenCL C language support, target independent optimizations and also for parts of code generation. It generates new LLVM-based code generators “on the fly” for the designed processors and loads them in to the compiler backend as runtime libraries to avoid per-target recompilation of larger parts of the compiler chain.
BCC is a Python + C framework for tracing and networking that is using Clang rewriter + 2nd pass of Clang + BPF backend to generate eBPF and push it into the kernel.
LLVMSharp and ClangSharp are type-safe C# bindings for Microsoft.NET and Mono that Platform Invoke into the native libraries. ClangSharp is self-hosted and is used to generated LLVMSharp using the LLVM-C API.
LLVMSharp Kaleidoscope Tutorials are instructive examples of writing a compiler in C#, with certain improvements like using the visitor pattern to generate LLVM IR.
ClangSharp PInvoke Generator is the self-hosting mechanism for LLVM/ClangSharp and is demonstrative of using LibClang to generate Platform Invoke (PInvoke) signatures for C APIs.
A wide variety of additional information is available on the LLVM web page, in particular in the documentation section. The web page also contains versions of the API documentation which is up-to-date with the Subversion version of the source code. You can access versions of these documents specific to this release by going into the llvm/docs/ directory in the LLVM tree.
If you have any questions or comments about LLVM, please feel free to contact us via the mailing lists.