This document contains the release notes for the LLVM Compiler Infrastructure,
release 4.0.0. Here we describe the status of LLVM, including major improvements
from the previous release, improvements in various subprojects of LLVM, and
some of the current users of the code. All LLVM releases may be downloaded
from the LLVM releases web site.
For more information about LLVM, including information about the latest
release, please check out the main LLVM web site. If you
have questions or comments, the LLVM Developer’s Mailing List is a good place to send
them.
Starting with this release, LLVM is using a
new versioning scheme,
increasing the major version number with each major release. Stable updates to
this release will be versioned 4.0.x, and the next major release, six months
from now, will be version 5.0.0.
- The minimum compiler version required for building LLVM has been raised to
4.8 for GCC and 2015 for Visual Studio.
- The C API functions LLVMAddFunctionAttr, LLVMGetFunctionAttr,
LLVMRemoveFunctionAttr, LLVMAddAttribute, LLVMRemoveAttribute,
LLVMGetAttribute, LLVMAddInstrAttribute and
LLVMRemoveInstrAttribute have been removed.
- The C API enum LLVMAttribute has been deleted.
- The definition and uses of LLVM_ATRIBUTE_UNUSED_RESULT in the LLVM source
were replaced with LLVM_NODISCARD, which matches the C++17 [[nodiscard]]
semantics rather than gcc’s __attribute__((warn_unused_result)).
- The Timer related APIs now expect a Name and Description. When upgrading code
the previously used names should become descriptions and a short name in the
style of a programming language identifier should be added.
- LLVM now handles invariant.group across different basic blocks, which makes
it possible to devirtualize virtual calls inside loops.
- The aggressive dead code elimination phase (“adce”) now removes
branches which do not effect program behavior. Loops are retained by
default since they may be infinite but these can also be removed
with LLVM option -adce-remove-loops when the loop body otherwise has
no live operations.
- The llvm-cov tool can now export coverage data as json. Its html output mode
has also improved.
Integration with profile data (PGO). When available, profile data
enables more accurate function importing decisions, as well as
cross-module indirect call promotion.
Significant build-time and binary-size improvements when compiling with
debug info (-g).
Support was added for _regcall calling convention.
Existing __vectorcall calling convention support was extended to include
correct handling of HVAs.
The __vectorcall calling convention was introduced by Microsoft to
enhance register usage when passing parameters.
For more information please read __vectorcall documentation.
The __regcall calling convention was introduced by Intel to
optimize parameter transfer on function call.
This calling convention ensures that as many values as possible are
passed or returned in registers.
For more information please read __regcall documentation.
Passes that work on the machine instruction representation can be tested with
the .mir serialization format. llc supports the -run-pass,
-stop-after, -stop-before, -start-after, -start-before to
run a single pass of the code generation pipeline, or to stop or start the code
generation pipeline at a given point.
Additional information can be found in the Machine IR (MIR) Format Reference Manual. The format is
used by the tests ending in .mir in the test/CodeGen directory.
This feature is available since 2015. It is used more often lately and was not
mentioned in the release notes yet.
The intrusive list infrastructure was substantially rewritten over the last
couple of releases, primarily to excise undefined behaviour. The biggest
changes landed in this release.
- simple_ilist<T> is a lower-level intrusive list that never takes
ownership of its nodes. New intrusive-list clients should consider using it
instead of ilist<T>.
- ilist_tag<class> allows a single data type to be inserted into two
parallel intrusive lists. A type can inherit twice from ilist_node,
first using ilist_node<T,ilist_tag<A>> (enabling insertion into
simple_ilist<T,ilist_tag<A>>) and second using
ilist_node<T,ilist_tag<B>> (enabling insertion into
simple_ilist<T,ilist_tag<B>>), where A and B are arbitrary
types.
- ilist_sentinel_tracking<bool> controls whether an iterator knows
whether it’s pointing at the sentinel (end()). By default, sentinel
tracking is on when ABI-breaking checks are enabled, and off otherwise;
this is used for an assertion when dereferencing end() (this assertion
triggered often in practice, and many backend bugs were fixed). Explicitly
turning on sentinel tracking also enables iterator::isEnd(). This is
used by MachineInstrBundleIterator to iterate over bundles.
- ilist<T> is built on top of simple_ilist<T>, and supports the same
configuration options. As before (and unlike simple_ilist<T>),
ilist<T> takes ownership of its nodes. However, it no longer supports
allocating nodes, and is now equivalent to iplist<T>. iplist<T>
will likely be removed in the future.
- ilist<T> now always uses ilist_traits<T>. Instead of passing a
custom traits class in via a template parameter, clients that want to
customize the traits should specialize ilist_traits<T>. Clients that
want to avoid ownership can specialize ilist_alloc_traits<T> to inherit
from ilist_noalloc_traits<T> (or to do something funky); clients that
need callbacks can specialize ilist_callback_traits<T> directly.
- The underlying data structure is now a simple recursive linked list. The
sentinel node contains only a “next” (begin()) and “prev” (rbegin())
pointer and is stored in the same allocation as simple_ilist<T>.
Previously, it was malloc-allocated on-demand by default, although the
now-defunct ilist_sentinel_traits<T> was sometimes specialized to avoid
this.
- The reverse_iterator class no longer uses std::reverse_iterator.
Instead, it now has a handle to the same node that it dereferences to.
Reverse iterators now have the same iterator invalidation semantics as
forward iterators.
- iterator and reverse_iterator have explicit conversion constructors
that match std::reverse_iterator‘s off-by-one semantics, so that
reversing the end points of an iterator range results in the same range
(albeit in reverse). I.e., reverse_iterator(begin()) equals
rend().
- iterator::getReverse() and reverse_iterator::getReverse() return an
iterator that dereferences to the same node. I.e.,
begin().getReverse() equals --rend().
- ilist_node<T>::getIterator() and
ilist_node<T>::getReverseIterator() return the forward and reverse
iterators that dereference to the current node. I.e.,
begin()->getIterator() equals begin() and
rbegin()->getReverseIterator() equals rbegin().
- iterator now stores an ilist_node_base* instead of a T*. The
implicit conversions between ilist<T>::iterator and T* have been
removed. Clients may use N->getIterator() (if not nullptr) or
&*I (if not end()); alternatively, clients may refactor to use
references for known-good nodes.
During this release the AArch64 target has:
- Gained support for ILP32 relocations.
- Gained support for XRay.
- Made even more progress on GlobalISel. There is still some work left before
it is production-ready though.
- Refined the support for Qualcomm’s Falkor and Samsung’s Exynos CPUs.
- Learned a few new tricks for lowering multiplications by constants, folding
spilled/refilled copies etc.
During this release the ARM target has:
- Gained support for ROPI (read-only position independence) and RWPI
(read-write position independence), which can be used to remove the need for
a dynamic linker.
- Gained support for execute-only code, which is placed in pages without read
permissions.
- Gained a machine scheduler for Cortex-R52.
- Gained support for XRay.
- Gained Thumb1 implementations for several compiler-rt builtins. It also
has some support for building the builtins for HF targets.
- Started using the generic bitreverse intrinsic instead of rbit.
- Gained very basic support for GlobalISel.
A lot of work has also been done in LLD for ARM, which now supports more
relocations and TLS.
Note: From the next release (5.0), the “vulcan” target will be renamed to
“thunderx2t99”, including command line options, assembly directives, etc. This
release (4.0) will be the last one to accept “vulcan” as its name.
This marks the first release where the AVR backend has been completely merged
from a fork into LLVM trunk. The backend is still marked experimental, but
is generally quite usable. All downstream development has halted on
GitHub, and changes now go directly into
LLVM trunk.
- Instruction selector and pseudo instruction expansion pass landed
- read_register and write_register intrinsics are now supported
- Support stack stores greater than 63-bytes from the bottom of the stack
- A number of assertion errors have been fixed
- Support stores to undef locations
- Very basic support for the target has been added to clang
- Small optimizations to some 16-bit boolean expressions
Most of the work behind the scenes has been on correctness of generated
assembly, and also fixing some assertions we would hit on some well-formed
inputs.
During this release the MIPS target has:
- IAS is now enabled by default for Debian mips64el.
- Added support for the two operand form for many instructions.
- Added the following macros: unaligned load/store, seq, double word load/store for O32.
- Improved the parsing of complex memory offset expressions.
- Enabled the integrated assembler by default for Debian mips64el.
- Added a generic scheduler based on the interAptiv CPU.
- Added support for thread local relocations.
- Added recip, rsqrt, evp, dvp, synci instructions in IAS.
- Optimized the generation of constants from some cases.
The following issues have been fixed:
- Thread local debug information is correctly recorded.
- MSA intrinsics are now range checked.
- Fixed an issue with MSA and the no-odd-spreg abi.
- Fixed some corner cases in handling forbidden slots for MIPSR6.
- Fixed an issue with jumps not being converted to relative branches for assembly.
- Fixed the handling of local symbols and jal instruction.
- N32/N64 no longer have their relocation tables sorted as per their ABIs.
- Fixed a crash when half-precision floating point conversion MSA intrinsics are used.
- Fixed several crashes involving FastISel.
- Corrected the corrected definitions for aui/daui/dahi/dati for MIPSR6.
During this release the X86 target has:
- Added support AMD Ryzen (znver1) CPUs.
- Gained support for using VEX encoding on AVX-512 CPUs to reduce code size when possible.
- Improved AVX-512 codegen.
- The attribute API was completely overhauled, following the changes
to the C API.
D is a language with C-like syntax and static typing. It
pragmatically combines efficiency, control, and modeling power, with safety and
programmer productivity. D supports powerful concepts like Compile-Time Function
Execution (CTFE) and Template Meta-Programming, provides an innovative approach
to concurrency and offers many classical paradigms.
LDC uses the frontend from the reference compiler
combined with LLVM as backend to produce efficient native code. LDC targets
x86/x86_64 systems like Linux, OS X, FreeBSD and Windows and also Linux on ARM
and PowerPC (32/64 bit). Ports to other architectures like AArch64 and MIPS64
are underway.
In addition to producing an easily portable open source OpenCL
implementation, another major goal of pocl
is improving performance portability of OpenCL programs with
compiler optimizations, reducing the need for target-dependent manual
optimizations. An important part of pocl is a set of LLVM passes used to
statically parallelize multiple work-items with the kernel compiler, even in
the presence of work-group barriers. This enables static parallelization of
the fine-grained static concurrency in the work groups in multiple ways.
TCE is a toolset for designing customized
processors based on the Transport Triggered Architecture (TTA).
The toolset provides a complete co-design flow from C/C++
programs down to synthesizable VHDL/Verilog and parallel program binaries.
Processor customization points include register files, function units,
supported operations, and the interconnection network.
TCE uses Clang and LLVM for C/C++/OpenCL C language support, target independent
optimizations and also for parts of code generation. It generates new
LLVM-based code generators “on the fly” for the designed TTA processors and
loads them in to the compiler backend as runtime libraries to avoid
per-target recompilation of larger parts of the compiler chain.