wiki:Commentary/Compiler/Core2CorePipeline

Core-to-Core optimization pipeline

After the source program has been typechecked it is desugared into GHC's intermediate language Core. The Core representation of a program is then optimized by a series of correctness preserving Core-to-Core passes. This page describes the overall structure of the Core-to-Core optimization pipeline. Detailed descriptions of optimizations are available in the published papers. An overview of the whole compiler pipeline is available here.

Optimizations during desugaring

At the end of desugaring we run the simpleOptPgm function that performs some simple optimizations: eliminating dead bindings, and inlining non-recursive bindings that are used only once or where the RHS is trivial. The rest of Core optimisations is performed by the Core-to-Core pipeline.

The pipeline

The structure of the Core-to-Core pipeline is determined in the getCoreToDo function in the compiler/simplCore/SimplCore.hs module. Below is an ordered list of performed optimisations. These are enabled by default with -O1 and -O2 unless the description says a specific flag is required. The simplifier, which the pipeline description below often refers to, is described in detail in the next section.

  • Static Argument Transformation: tries to remove redundant arguments to recursive calls, turning them into free variables in those calls. Only enabled with -fstatic-argument-transformation. If run this pass is preceded with a "gentle" run of the simplifier.
  • Simplifier, gentle run
  • Full laziness, 1st pass: floats let-bindings outside of lambdas. This pass includes annotating bindings with level information and then running the float-out pass. In this first pass of the full laziness we don't float partial applications and bindings that contain free variables - this will be done by the second pass later in the pipeline. See "Further Reading" section below for pointers where to find the description of the full laziness algorithm.
  • Float in, 1st pass: the opposite of full laziness, this pass floats let-bindings as close to their use sites as possible. It will not undo the full laziness by sinking bindings inside a lambda, unless the lambda is one-shot. At this stage we have not yet run the demand analysis, so we only have demand information for things that we imported.
  • Simplifier, main run: run the main passes of the simplifier (phases 2, 1 and 0). Phase 0 is run with at least 3 iterations.
  • Call arity: attempts to eta-expand local functions based on how they are used. If run, this pass is followed by a 0 phase of the simplifier. See Notes in compiler/simplCore/CallArity.hs and the relevant paper.
  • Demand analysis, 1st pass (a.k.a. strictness analysis): runs the demand analyser followed by worker-wrapper transformation and 0 phase of the simplifier. This pass tries to determine if some expressions are certain to be used and whether they will be used once or many times (cardinality analysis). We currently don't have means of saying that a binding is certain to be used many times. We can only determine that it is certain to be one-shot (ie. used only once) or probable to be one shot. Demand analysis pass only annotates Core with strictness information. This information is later used by worker/wrapper pass to perform transformations. CPR analysis is also done during demand analysis.
  • Full laziness, 2nd pass: another full-laziness pass. This time partial applications and functions with free variables are floated out.
  • Common Sub-expression-elimination: eliminates expressions that are identical.
  • Float in, 2nd pass
  • Check rules, 1st pass: this pass is not for optimisation but for troubleshooting the rules. It is only enabled with -frule-check flag that accepts a string pattern. This pass looks for rules beginning with that string pattern that could have fired but didn't and prints them to stdout.
  • Liberate case: unrolls recursive functions once in their own RHS, to avoid repeated case analysis of free variables. It's a bit like the call-pattern specialisation but for free variables rather than arguments. Followed by a phase 0 simplifier run. Only enabled with -fliberate-case flag.
  • Check rules, 2nd pass
  • Simplifier, final: final 0 phase of the simplifier.
  • Damand analysis, 2nd pass (a.k.a. late demand analysis): this pass consists of demand analysis followed by worker-wrapper transformation and phase 0 of the simplifier. The reason for this pass is that some opportunities for discovering strictness were not visible earlier; and optimisations like call-pattern specialisation can create functions with unused arguments which are eliminated by late demand analysis. Only run with -flate-dmd-anal. FIXME but the cardinality paper says something else, namely that the late pass is meant to detect single entry thunks. Is it still the case in the current implementation?
  • Check rules, 3rd pass

The plugin mechanism allows to modify the above pipeline dynamically.

Simplifier

Simplifier is the workhorse of the Core-to-Core optimisation pipeline. It performs all the local transformations: (TODO this list is most likely not comprehensive)

  • constant folding
  • applying the rewrite rules
  • inlining
  • case of case
  • case of known constructor
  • eta expansion and eta reduction
  • combining adjacent casts
  • pushing a cast out of the way of an application e.g.  (f |> k) a  ==>  f (a |> k1) |> k2 for suitable k1, k2.

Simplifier phases

Each run of the simplifier is assigned with a phase number: 2, 1 or 0. Phase numbers are used for control of interaction between the rules and INLINE/NOINLINE pragmas - see sections 9.31.6.5 and 9.32.3 of the user guide. There are many 0 phases because we use the simplifier to propagate the effects of other passes and once we've gone through the main runs of the simplifier we no longer have to worry about rules and inlinings - these have been already dealt with. There is also a special initial phase, which is used at the beginning of the pipeline by the "gentle" simplifier runs.

Simplifier iterations

Each single run of the simplifier consists of many iterations. Each iteration tries to apply all the optimisations. The simplifier run ends when a fixpoint is reached or when a predesignated number of iterations has been performed (the default is 4 and it can be controlled via the -fmax-simplifier-iterations flag.

Simplifier settings

Simplifier settings can be further tweaked by modifying the fields of SimplifierMode record. It is possible to turn on and off the following optimisations:

  • rewrite rules
  • inlining
  • case-of-case transformation
  • eta-expansion

The so-called "gentle" run disables the case-of-case transformation and inlining but enables the rules and eta-expansion. The reason it disables case-of-case is that it makes full laziness work better. FIXME that's what the comment says, and yet we run the full-laziness after the main run of the simplifier.

Other documentation

  • Commentary/Compiler/OptOrdering discusses the motivation for ordering the optimisations in the pipeline. This may or may not be up to date.
  • You can use -dverbose-core2core flag to dump the Core after almost all passes in the pipeline.

Further reading

  • Compilation by Transformation in Non-Strict Functional Languages (a.k.a. the Santos' thesis): basic source of information about core-to-core transformations. Describes local transformations, full laziness, static argument transformation and many others. Note: Santos' description of inlining is superseeded by "Secrets of the GHC inliner" (see below).
  • A transformation-based optimiser for Haskell, SL Peyton Jones and A Santos, Science of Computer Programming 32(1-3), pp3-47, September 1998. Gives a very simplified overview of Core and then goes into discussing: inlining, veta reduction, optimizing conditionals, unboxed data types and strictness analysis, and let floating (including full laziness). Again, this paper repeats the Santos' thesis, except for the unboxed data types description, where it repeats "Unboxed values as first class citizens in a non-strict functional language".
  • Secrets of the GHC inliner, Simon Peyton Jones and Simon Marlow, Journal of Functional Programming 12(4), July 2002, pp393-434. As the name of the paper rightly implies, this paper gives full details of the inliner.
  • Constructed Product Result Analysis for Haskell, Clem Baker-Finch, Kevin Glynn, and Simon Peyton Jones, Journal of Functional Programming 14(2), 211–245, March 2004. Describes optimisation that allows to return tuple components in registers (for functions that return tuples).

TODO

  • Is there any paper that focuses on worker-wrapper transformation?
  • Does handling of unboxed values described in "Unboxed values as first class citizens in a non-strict functional language" fit in the scope of this page?
Last modified 9 months ago Last modified on Feb 14, 2017 2:58:48 PM