Version 3 (modified by simonpj, 10 years ago) (diff)


[ Up: Commentary ]

Compiling one module: HscMain

Here we are going to look at the compilation of a single module. There is a picture that goes with this description, which appears at the bottom of this page, but you'll probably find it easier to open this link in another window, so you can see it at the same time as reading the text.

Look at the picture first. The yellow boxes are compiler passes, while the blue stuff on the left gives the data type that moves from one phase to the next. The entire pipeline for a single module is run by a module called HscMain (in GhcFile(compiler/main/HscMain)). Here are the steps it goes through:

  • The program is initially parsed into the HsSyn types (in the compiler/hsSyn directory), a collection of data types that describe the full abstract syntax of Haskell. HsSyn is a pretty big colleciton of types: there are 52 data types when I last counted. Many are pretty trivial, but a few have a lot of constructors (HsExpr has 40). HsSyn represents Haskell its full glory, complete with all syntactic sugar.
  • HsSyn is parameterised over the types of the variables it contains. The first three passes of the compiler work like this:
    • The parser produces HsSyn parameterised by {{{RdrName}}}. To a first approximation, a RdrName is just a string.
    • The renamer transforms this to HsSyn parameterised by [wiki:Commentary/Compiler/NameType Name]. To a first appoximation, a Name is a string plus a Unique (number) that uniquely identifies it.
    • The typechecker transforms this further, to HsSyn parameterised by {{{Id}}}?. To a first approximation, an Id is a Name plus a type.

These three data types are very important, and have their own pages.

  • The desugarer converts from the massive HsSyn type to GHC's intermediate language, CoreSyn (in the compiler/coreSyn direcdtory). This data type is relatively tiny: just eight constructors; again it has its own page.
  • The SimplCore pass (simplCore/SimplCore.lhs) is a bunch of Core-to-Core passes that optimise the program. The main passes are:
    • The Simplifier, which applies lots of small, local optimisations to the program. The simplifier is big and complicated, because it implements a lot of transformations; and tries to make them cascade nicely.
    • The float-out and float-in transformations, which move let-bindings outwards and inwards respectively.
    • The strictness analyser. This actually comprises two passes: the analayser itself and the worker/wrapper transformation that uses the results of the analysis to transform the program.
    • The liberate-case transformation.
    • The constructor-specialialisation transformation.
    • The common sub-expression eliminiation (CSE) transformation.
  • Then the CoreTidy pass gets the code into a form in which it can be imported into subsequent modules (when using --make) and/or put into an interface file. There are good notes at the top of the file compiler/main/TidyPgm.lhs; the main function is tidyProgram, for some reason documented as "Plan B".

The serialisation does (pretty much) nothing except serialise. All the intelligence is in the Core-to-IfaceSyn conversion; or, rather, in the reverse of that step.

  • The same, tidied Core program is now fed to the Back End. First there is a two-stage conversion from CoreSyn to StgSyn.
    • The first step is called CorePrep, a Core-to-Core pass that puts the program into A-normal form (ANF). In ANF, the argument of every application is a variable or literal; more complicated arguments are let-bound. Actually CorePrep does quite a bit more: there is a detailed list at the top of the file compiler/coreSyn/CorePrep.lhs.
    • The second step, CoreToStg, moves to the StgSyn data type (the code is in [GhcFile(stgSyn/CoreToStg.lhs)?]. The output of CorePrep is carefully arranged to exactly match what StgSyn allows (notably ANF), so there is very little work to do. However, StgSyn is decorated with lots of redundant information (free variables, let-no-escape indicators), which is generated on-the-fly by CoreToStg.
  • Next, the code generator converts the STG program to a C-- program. The code generator is a Big Mother, and lives in directory compiler/codeGen
  • Now the path forks again:
    • If we are generating GHC's stylised C code, we can just pretty-print the C-- code as stylised C (compiler/cmm/PprC.hs)
    • If we are generating native code, we invoke the native code generator. This is another Big Mother, and lives in compiler/nativeGen.

The Diagram

This diagram is also located here, so that you can open it in a separate window.

Attachments (2)

Download all attachments as: .zip