wiki:Commentary/Pipeline

Version 33 (modified by jstolarek, 18 months ago) (diff)

--

Overview

GHC is structured into two parts:

  • The ghc package (in subdirectory compiler), which implements almost all GHC's functionality. It is an ordinary Haskell library, and can be imported into a Haskell program by saying import GHC.
  • The ghc binary (in subdirectory ghc) which imports the ghc package, and implements the I/O for the ghci interactive loop.

Here's an overview of the module structure of the top levels of GHC library. (Note: more precisly, this is the plan. Currently the module Make below is glommed into the giant module GHC.)

          |---------------------------------|
          |              GHC                |
          | The root module for the GHC API |
          | Very little code;               |
          | just simple wrappers            |
          |---------------------------------|
                     /                \
                    /                  \
                   /                    \
 |------------------------|    |------------------------|
 |        GhcMake         |    |    InteractiveEval     |
 | Implements --make      |    | Stuff to support the   |
 | Deals with compiling   |    | GHCi interactive envt  |
 |    multiple modules    |    |                        |
 |------------------------|    |------------------------|
           |                                |
           |                                |
           |      --------------------      |
- - - - - -| - - -|     GhcMonad     |- - - | - - - - - - - -
           |      --------------------      |
           |                                |
           |                                |
 |-------------------------|                |
 |   DriverPipeline        |                |
 | Deals with compiling    |                |
 |  *a single module*      |                |
 | through all its stages  |                |
 | (cpp, unlit, compile,   |                |
 |  assemble, link etc)    |                |
 |-------------------------|                |
              \                             |
               \                            |
                \                           |  
         |----------------------------------------------|
         |                    HscMain                   |
         | Compiling a single module (or expression or  |
         | stmt) to bytecode, or to a M.hc or M.s file  |
         |----------------------------------------------|
              |      |       |         |       |
            Parse Rename Typecheck Optimise CodeGen

The driver pipeline

The driver pipeline consist of a couple of phases that call other programs and generate a series of intermediate files. Code responsible for managing the order of phases is in compiler/main/DriverPhases.hs, while managing the driver pipeline as a whole is coded in compiler/main/DriverPipeline.hs. Note that driver pipeline is not the same thing as compilation pipeline: the latter is part of the former.

Let's take a look at the overall structure of the driver pipeline. When we compile Foo.hs or Foo.lhs ("lhs" extension means that Literate Haskell is being used) the following phases are being called (some of them depending on additional conditions like file extensions or enabled flags):

  • Run the unlit pre-processor, unlit, to remove the literate markup, generating Foo.lpp. The unlit processor is a C program kept in utils/unlit.
  • Run the C preprocessor, cpp, (if -cpp is specified), generating Foo.hspp.
  • Run the compiler itself. This does not start a separate process; it's just a call to a Haskell function. This step always generates an 'interface file' Foo.hi, and depending on what flags you give, it also generates a compiled file. As GHC supports three backend code generators currently (a native code generator, a C code generator and an llvm code generator) the possible range of outputs depends on the backend used. All three support assembly output:
    • Object code: no flags required, file Foo.o (supported by all three backends)
    • Assembly code: flag -S, file Foo.s (supported by all three backends)
    • C code: flags -C, file Foo.hc (only supported by C backend)
  • In the -fvia-C case:
    • Run the C compiler on Foo.hc, to generate Foo.s.
  • If -split-objs is in force, run the splitter on Foo.s. This splits Foo.s into lots of small files. The idea is that the static linker will thereby avoid linking dead code.
  • Run the assembler on Foo.s, or if -split-objs is in force, on each individual assembly file.

The compiler pipeline

The compiler itself, independent of the external tools, is also structured as a pipeline. For details (and a diagram), see Commentary/Compiler/HscMain

Video

Video of compilation pipeline explanation from 2006: Compilation Pipeline and interface files (17'30")