The vectorization transformation can be rather complicated, involving several
potential alternatives, especially for outer-loops but also possibly for
innermost loops. These alternatives may have significant performance impact,
both positive and negative. A cost model is therefore employed to identify the
best alternative, including the alternative of avoiding any transformation
altogether.
The Vectorization Plan is an explicit model for describing vectorization
candidates. It serves for both optimizing candidates including estimating their
cost reliably, and for performing their final translation into IR. This
facilitates dealing with multiple vectorization candidates.
VPlan-based vectorization involves three major steps, taking a “scenario-based
approach” to vectorization planning:
- Legal Step: check if a loop can be legally vectorized; encode constraints and
artifacts if so.
- Plan Step:
- Build initial VPlans following the constraints and decisions taken by
Legal Step 1, and compute their cost.
- Apply optimizations to the VPlans, possibly forking additional VPlans.
Prune sub-optimal VPlans having relatively high cost.
- Execute Step: materialize the best VPlan. Note that this is the only step
that modifies the IR.
In what follows, the term “input IR” refers to code that is fed into the
vectorizer whereas the term “output IR” refers to code that is generated by the
vectorizer. The output IR contains code that has been vectorized or “widened”
according to a loop Vectorization Factor (VF), and/or loop unroll-and-jammed
according to an Unroll Factor (UF).
The design of VPlan follows several high-level guidelines:
- Analysis-like: building and manipulating VPlans must not modify the input IR.
In particular, if the best option is not to vectorize at all, the
vectorization process terminates before reaching Step 3, and compilation
should proceed as if VPlans had not been built.
- Align Cost & Execute: each VPlan must support both estimating the cost and
generating the output IR code, such that the cost estimation evaluates the
to-be-generated code reliably.
- Support vectorizing additional constructs:
- Outer-loop vectorization. In particular, VPlan must be able to model the
control-flow of the output IR which may include multiple basic-blocks and
nested loops.
- SLP vectorization.
- Combinations of the above, including nested vectorization: vectorizing
both an inner loop and an outer-loop at the same time (each with its own
VF and UF), mixed vectorization: vectorizing a loop with SLP patterns
inside , (re)vectorizing input IR containing vector code.
- Function vectorization .
- Support multiple candidates efficiently. In particular, similar candidates
related to a range of possible VF’s and UF’s must be represented efficiently.
Potential versioning needs to be supported efficiently.
- Support vectorizing idioms, such as interleaved groups of strided loads or
stores. This is achieved by modeling a sequence of output instructions using
a “Recipe”, which is responsible for computing its cost and generating its
code.
- Encapsulate Single-Entry Single-Exit regions (SESE). During vectorization
such regions may need to be, for example, predicated and linearized, or
replicated VF*UF times to handle scalarized and predicated instructions.
Innerloops are also modelled as SESE regions.
- Support instruction-level analysis and transformation, as part of Planning
Step 2.b: During vectorization instructions may need to be traversed, moved,
replaced by other instructions or be created. For example, vector idiom
detection and formation involves searching for and optimizing instruction
patterns.
Transforming the Loop Vectorizer to use VPlan follows a staged approach. First,
VPlan is used to record the final vectorization decisions, and to execute them:
the Hierarchical CFG models the planned control-flow, and Recipes capture
decisions taken inside basic-blocks. Next, VPlan will be used also as the basis
for taking these decisions, effectively turning them into a series of
VPlan-to-VPlan algorithms. Finally, VPlan will support the planning process
itself including cost-based analyses for making these decisions, to fully
support compositional and iterative decision making.
Some decisions are local to an instruction in the loop, such as whether to widen
it into a vector instruction or replicate it, keeping the generated instructions
in place. Other decisions, however, involve moving instructions, replacing them
with other instructions, and/or introducing new instructions. For example, a
cast may sink past a later instruction and be widened to handle first-order
recurrence; an interleave group of strided gathers or scatters may effectively
move to one place where they are replaced with shuffles and a common wide vector
load or store; new instructions may be introduced to compute masks, shuffle the
elements of vectors, and pack scalar values into vectors or vice-versa.
In order for VPlan to support making instruction-level decisions and analyses,
it needs to model the relevant instructions along with their def/use relations.
This too follows a staged approach: first, the new instructions that compute
masks are modeled as VPInstructions, along with their induced def/use subgraph.
This effectively models masks in VPlan, facilitating VPlan-based predication.
Next, the logic embedded within each Recipe for generating its instructions at
VPlan execution time, will instead take part in the planning process by modeling
them as VPInstructions. Finally, only logic that applies to instructions as a
group will remain in Recipes, such as interleave groups and potentially other
idiom groups having synergistic cost.