Changes between Version 9 and Version 10 of Building/Architecture

Mar 31, 2009 10:04:46 AM (9 years ago)



  • Building/Architecture

    v9 v10  
    7272first, and then get on to the specifics of how we build GHC.
    74 === Idiom: non-recursive make ===
    76 Build systems for large projects often use the technique commonly
    77 known as "recursive make", where there is a separate `Makefile` in
    78 each directory that is capable of building that part of the system.
    79 The `Makefile`s may share some common infrastructure and configuration
    80 by using GNU '''make''''s `include` directive; this is exactly what the
    81 previous GHC build system did.  However, this design has a number of
    82 flaws, as described in Peter Miller's
    83 [ Recursive Make Considered Harmful]. 
    85 The GHC build system adopts the non-recursive '''make''' idiom.  That is, we
    86 never invoke '''make''' from inside a `Makefile`, and the whole build system
    87 is effectively a single giant `Makefile`.
    89 This gives us the following advantages:
    91  * Specifying dependencies between different parts of the tree is
    92    easy.  In this way, we can accurately specify many dependencies
    93    that we could not in the old recursive-make system.  This makes it much more likely that when you say "make"
    94    after modifying parts of the tree or pulling new patches,
    95    the build system will bring everything up-to-date in the correct order, and leave you with a working
    96    system.
    98  * More parallelism: dependencies are more fine-grained, and there
    99    is no need to build separate parts of the system in sequence, so
    100    the overall effect is that we have more parallelism in the build.
    102 Doesn't this sacrifice modularity?  No - we can still split the build
    103 system into separate files, using GNU '''make''''s `include`.
    105 Specific notes related to this idiom:
    107  * Individual directories usually have a `` file which
    108    contains the build instructions for that directory.
    110  * Other parts of the build system are in `mk/*.mk` and `rules/*.mk`.
    112  * The top-level `` file includes all the other `*.mk` files in
    113    the tree.  The top-level `Makefile` invokes '''make''' on ``
    114    (this is the only recursive invocation of '''make'''; see the "phase
    115    ordering" idiom below).
    117 === Idiom: stub makefiles ===
    119 It's all very well having a single giant `Makefile` that knows how to
    120 build everything in the right order, but sometimes you want to build
    121 just part of the system.  When working on GHC itself, we might want to
    122 build just the compiler, for example.  In the recursive '''make''' system we
    123 would do `cd ghc` and then `make`.  In the non-recursive system we can
    124 still achieve this by specifying the target with something like `make
    125 ghc/stage1/build/ghc`, but that's not so convenient.
    127 Our second idiom therefore supports the `cd ghc; make` idiom, just as
    128 with recursive make. To achieve this we put tiny stub `Makefile` in each
    129 directory whose job it is to invoke the main `Makefile` specifying the
    130 appropriate target(s) for that directory.  These stub `Makefiles`
    131 follow a simple pattern:
    133 {{{
    134 dir = libraries/base
    135 TOP = ../..
    136 include $(TOP)/mk/
    137 }}}
    139 where `mk/` knows how to recursively invoke the giant top-level '''make'''.
    141 === Idiom: standard targets (all, clean, etc.) ===
    143 We want an `all` target that builds everything, but we also want a way to build individual components (say, everything in `rts/`).  This is achieved by having a separate "all" target for each directory, named `all_`''directory''.  For example in `rts/` we might have this:
    145 {{{
    146 all : all_rts
    147 .PHONY all_rts
    148 all_rts : ...dependencies...
    149 }}}
    150 When the top level '''make''' includes all these `` files, it will see that target `all` depends on `all_rts, all_ghc, ...etc...`; so `make all` will make all of these.  But the individual targets are still available.  In particular, you can say
    151   * `make all_rts` (anywhere) to build everything in the RTS directory
    152   * `make all` (anywhere) to build everything
    153   * `make`, with no explicit target, makes the default target in the current directory's stub `Makefile`, which in turn makes the target `all_`''dir'', where ''dir'' is the current directory.
    155 Other standard targets such as `clean`, `install`, and so on use the same technique.  There are pre-canned macros to define your "all" and "clean" targets, take a look in `rules/` and `rules/`.
    157 === Idiom: stages ===
    159 What do we use to compile GHC?  GHC itself, of course.  In a complete build we actually build GHC twice: once using the GHC version that is installed, and then again using the GHC we just built.  To be clear about which GHC we are talking about, we number them:
    161  * '''Stage 0''' is the GHC you have installed.  The "GHC you have installed" is also called "the bootstrap compiler".
    162  * '''Stage 1''' is the first GHC we build, using stage 0.  Stage 1 is then used to build the packages.
    163  * '''Stage 2''' is the second GHC we build, using stage 1.  This is the one we normally install when you say `make install`.
    164  * '''Stage 3''' is optional, but is sometimes built to test stage 2.
    166 Stage 1 does not support interactive execution (GHCi) and Template Haskell.  The reason being that when running byte code we must dynamically link the packages, and only in stage 2 and later can we guarantee that the packages we dynamically link are compatible with those that GHC was built against (because they are the very same packages).
    169 === Idiom: distdir ===
    171 Often we want to build a component multiple times in different ways.  For example:
    173  * certain libraries (e.g. Cabal) are required by GHC, so we build them once with the
    174    bootstrapping compiler, and again with stage 1 once that is built.
    176  * GHC itself is built multiple times (stage 1, stage 2, maybe stage 3)
    178  * some tools (e.g. ghc-pkg) are also built once with the bootstrapping compiler,
    179    and then again using stage 1 later.
    181 In order to support multiple builds in a directory, we place all generated files in a subdirectory, called the "distdir".  The distdir can be anything at all; for example in `compiler/` we name our distdirs after the stage (`stage1`, `stage2` etc.).  When there is only a single build in a directory, by convention we usually call the distdir simply "dist".
    183 There is a related concept called ''ways'', which includes profiling and dynamic-linking.  Multiple ways are currently part of the same "build" and use the same distdir, but in the future we might unify these concepts and give each way its own distdir.
    185 === Idiom: interaction with Cabal ===
    187 Many of the components of the GHC build system are also Cabal
    188 packages, with package metadata defined in a `foo.cabal` file. For the
    189 GHC build system we need to extract that metadata and use it to build
    190 the package. This is done by the program `ghc-cabal` (in `utils/ghc-cabal`
    191 in the GHC source tree). This program reads `foo.cabal` and produces
    192 `` containing the package metadata in the form of
    193 makefile bindings that we can use directly.
    195 We adhere to the following rule: '''`ghc-cabal` generates only
    196 makefile variable bindings''', such as
    197 {{{
    198   HS_SRCS = Foo.hs Bar.hs
    199 }}}
    200 `ghc-cabal` never generates makefile rules, macro, macro invocations etc.
    201 All the makefile code is therefore contained in fixed, editable
    202 `.mk` files.
    204 === Idiom: variable names ===
    206 Now that our build system is one giant `Makefile`, all our variables
    207 share the same namespace.  Where previously we might have had a
    208 variable that contained a list of the Haskell source files called
    209 `HS_SRCS`, now we have one of these for each directory (and indeed each build, or distdir) in the source tree,
    210 so we have to give them all different names.
    212 The idiom that we use for distinguishing variable names is to prepend
    213 the directory name and the distdir to the variable.  So for example the list of
    214 Haskell sources in the directory `utils/hsc2hs` would be in the
    215 variable `utils/hsc2hs_dist_HS_SRCS` ('''make''' doesn't mind slashes in variable
    216 names).  The pattern is: ''directory''_''distdir''_''variable''.
    218 === Idiom: macros ===
    219 The build system makes extensive use of Gnu '''make''' '''macros'''.  A macro is defined in
    220 GNU '''make''' using `define`, e.g.
    222 {{{
    223 define build-package
    224 # args: $1 = directory, $2 = distdir
    225 ... makefile code to build a package ...
    226 endef
    227 }}}
    229 (for example, see `rules/build-package`), and is invoked like this:
    232 {{{
    233 $(eval $(call build-package,libraries/base,dist))
    234 }}}
    236 (this invocation would be in `libraries/base/`).
    238 Note that `eval` works like this: its argument is expended as normal,
    239 and then the result is interpreted by '''make''' as makefile code.  This
    240 means the body of the `define` gets expanded ''twice''.  Typically
    241 this means we need to use `$$` instead of `$` everywhere in the body of
    242 `define`.
    244 Now, the `build-package` macro may need to define '''local variables'''.
    245 There is no support for local variables in macros, but we can define
    246 variables which are guaranteed to not clash with other variables by
    247 preceding their names with a string that is unique to this macro call.
    248 A convenient unique string to use is ''directory''_''distdir''_; this is unique as long as we only call each macro with a given directory/build pair once.  Most macros in
    249 the GHC build system take the directory and build as the first two
    250 arguments for exactly this reason.  For example, here's an excerpt
    251 from the `build-prog` macro:
    253 {{{
    254 define build-prog
    255 # $1 = dir
    256 # $2 = distdir
    257 # $3 = GHC stage to use (0 == bootstrapping compiler)
    259 $1_$2_INPLACE = $$(INPLACE_BIN)/$$($1_$2_PROG)
    260 ...
    261 }}}
    263 So if `build-prog` is called with `utils/hsc2hs` and `dist` for the
    264 first two arguments, after expansion '''make''' would see this:
    266 {{{
    267 utils/hsc2hs_dist_INPLACE = $(INPLACE_BIN)/$(utils/hsc2hs_dist_PROG)
    268 }}}
    270 The idiom of `$$($1_$2_VAR)` is very common throughout the build
    271 system - get used to reading it!  Note that the only time we use a
    272 single `$` in the body of `define` is to refer to the parameters `$1`,
    273 `$2`, and so on.
    275 === Idiom: phase ordering ===
    277 NB. you need to understand this section if either (a) you are modifying parts of the build system that include automatically-generated `Makefile` code, or (b) you need to understand why we have a top-level `Makefile` that recursively invokes '''make'''.
    279 The main hitch with non-recursive '''make''' arises when parts of the build
    280 system are automatically-generated.  The automatically-generated parts
    281 of our build system fall into two main categories:
    283  * Dependencies: we use `ghc -M` to generate make-dependencies for
    284    Haskell source files, and similarly `gcc -M` to do the same for
    285    C files.  The dependencies are normally generated into a file
    286    `.depend`, which is included as normal.
    288  * Makefile binding generated from `.cabal` package descriptions.  See
    289    "Idiom: interaction with Cabal".
    291 Now, we also want to be able to use `make` to build these files, since
    292 they have complex dependencies themselves.  For example, in order to build
    293 `` we need to first build `ghc-cabal` etc.; similarly,
    294 a `.depend` file needs to be re-generated if any of the source files have changed.
    296 GNU '''make''' has a clever strategy for handling this kind of scenario.  It
    297 first reads all the included Makefiles, and then tries to build each
    298 one if it is out-of-date, using the rules in the Makefiles themselves.
    299 When it has brought all the included Makefiles up-to-date, it restarts itself
    300 to read the newly-generated Makefiles.
    302 This works fine, unless there are dependencies ''between'' the
    303 Makefiles.  For example in the GHC build, the `.depend` file for a
    304 package cannot be generated until `` has been generated
    305 and '''make''' has been restarted to read in its contents, because it is the
    306 `` file that tells us which modules are in the package.
    307 But '''make''' always makes '''all''' the included `Makefiles` before restarting - it
    308 doesn't know how to restart itself earlier when there is a dependency
    309 between included `Makefiles`.
    311 Consider the following Makefile:
    313 {{{
    314 all :
    316 include
    318 : Makefile
    319         echo "X = C" >$@
    321 include
    323 :
    324         echo "Y = $(X)" >$@
    325 }}}
    327 Now try it:
    329 {{{
    330 $ make -f
    331 No such file or directory
    332 No such file or directory
    333 echo "X = C" >
    334 echo "Y = " >
    335 make: Nothing to be done for `all'.
    336 }}}
    338 '''make''' built both `` and `` without restarting itself
    339 between the two (even though we added a dependency on `` from
    340 ``).
    342 The solution we adopt in the GHC build system is as follows.  We have
    343 two Makefiles, the first a wrapper around the second.
    345 {{{
    346 # top-level Makefile
    347 % :
    348         $(MAKE) -f PHASE=0 just-makefiles
    349         $(MAKE) -f $<
    350 }}}
    352 {{{
    353 #
    355 include
    357 ifeq "$(PHASE)" "0"
    359 :
    360         echo "X = C" >$@
    362 else
    364 include
    366 :
    367         echo "Y = $(X)" >$@
    369 endif
    371 just-makefiles:
    372         @: # do nothing
    374 clean :
    375         rm -f
    376 }}}
    377 Each time '''make''' is invoked, we recursively invoke '''make''' in several
    378 ''phases'':
    379  * '''Phase 0''': invoke `` with `PHASE=0`.  This brings ``
    380    up-to-date (and ''only'' ``). 
    382  * '''Final phase''': invoke `` again (with `PHASE` unset).  Now we can be sure
    383    that `` is up-to-date and proceed to generate ``. 
    384    If this changes ``, then '''make''' automatically re-invokes itself,
    385    repeating the final phase.
    386 We could instead have abandoned '''make''''s automatic re-invocation mechanism altogether,
    387 and used three explicit phases (0, 1, and final), but in practice it's very convenient to use the automatic
    388 re-invocation when there are no problematic dependencies.
    390 Note that the `` rule is ''only'' enabled in phase 0, so that if we accidentally call `` without first performing phase 0, we will either get a failure (if `` doesn't exist), or otherwise '''make''' will not update `` if it is out-of-date.
    392 In the case of the GHC build system we need 4 such phases, see the
    393 comments in the top-level `` for details.
    395 This approach is not at all pretty, and
    396 re-invoking '''make''' every time is slow, but we don't know of a better
    397 workaround for this problem.
    402 === Idiom: no double-colon rules ===
    404 '''Make''' has a special type of rule of the form `target :: prerequisites`,
    405 with the behaviour that all double-colon rules for a given target are
    406 executed if the target needs to be rebuilt.  This style was popular
    407 for things like "all" and "clean" targets in the past, but it's not
    408 really necessary - see the "all" idiom above - and this means there's one fewer makeism you need to know about.
    410 === Idiom: the vanilla way ===
    412 Libraries can be built in several different "ways", for example
    413 "profiling" and "dynamic" are two ways.  Each way has a short tag
    414 associated with it; "p" and "dyn" are the tags for profiling and
    415 dynamic respectively.  In previous GHC build systems, the "normal" way
    416 didn't have a name, it was just always built.  Now we explicitly call
    417 it the "vanilla" way and use the tag "v" to refer to it. 
    419 This means that the `GhcLibWays` variable, which lists the ways in
    420 which the libraries are built, must include "v" if you want the
    421 vanilla way to be built (this is included in the default setup, of
    422 course).
    424 === Idiom: whitespace ===
    426 make has a rather ad-hoc approach to whitespace. Most of the time it ignores it, e.g.
    427 {{{
    428 FOO = bar
    429 }}}
    430 sets `FOO` to `"bar"`, not `" bar"`. However, sometimes whitespace is significant,
    431 and calling macros is one example. For example, we used to have a call
    432 {{{
    433 $(call all-target, $$($1_$2_INPLACE))
    434 }}}
    435 and this passed `" $$($1_$2_INPLACE)"` as the argument to `all-target`. This in turn generated
    436 {{{
    437 .PHONY: all_ inplace/bin/ghc-asm
    438 }}}
    439 which caused an infinite loop, as make continually thought that `ghc-asm` was out-of-date, rebuilt it,
    440 reinvoked make, and then thought it was out of date again.
    442 The moral of the story is, avoid white space unless you're sure it'll be OK!
    444 === Idiom: platform names ===
    446 There are three platforms of interest when building GHC:
    448  * `$(BUILDPLATFORM)`: The ''build'' platform.[[br]]
    449    The platform on which we are doing this build.
    451  * `$(HOSTPLATFORM)`: The ''host'' platform.[[br]]
    452    The platform on which these binaries will run.
    454  * `$(TARGETPLATFORM)`: The ''target'' platform.[[br]]
    455    The platform for which this compiler will generate code.
    457 These platforms are set when running the
    458 {{{configure}}} script, using the
    459 {{{--build}}}, {{{--host}}}, and
    460 {{{--target}}} options.  The {{{mk/}}}
    461 file, which is generated by `configure` from [], defines several symbols related to the platform settings.
    463 We don't currently support build and host being different, because
    464 the build process creates binaries that are both run during the build,
    465 and also installed.
    467 If host and target are different, then we are building a
    468 cross-compiler.  For GHC, this means a compiler
    469 which will generate intermediate .hc files to port to the target
    470 architecture for bootstrapping.  The libraries and stage 2 compiler
    471 will be built as HC files for the target system (see [wiki:Building/Porting Porting GHC] for details).
    473 More details on when to use BUILD, HOST or TARGET can be found in
    474 the comments in [].
     74 * [wiki:Building/Architecture/Idiom/NonRecursiveMake Non-recursive make]
     75 * [wiki:Building/Architecture/Idiom/StubMakefiles Stub makefiles]
     76 * [wiki:Building/Architecture/Idiom/StandardTargets Standard targets (all, clean etc.)]
     77 * [wiki:Building/Architecture/Idiom/Stages Stages]
     78 * [wiki:Building/Architecture/Idiom/Distdir Distdir]
     79 * [wiki:Building/Architecture/Idiom/Cabal Interaction with Cabal]
     80 * [wiki:Building/Architecture/Idiom/VariableNames Variable names]
     81 * [wiki:Building/Architecture/Idiom/Macros Macros]
     82 * [wiki:Building/Architecture/Idiom/PhaseOrdering Phase ordering]
     83 * [wiki:Building/Architecture/Idiom/DoubleColon No double-colon rules]
     84 * [wiki:Building/Architecture/Idiom/VanillaWay The vanilla way]
     85 * [wiki:Building/Architecture/Idiom/Whitespace Whitespace]
     86 * [wiki:Building/Architecture/Idiom/PlatformNames Platform names (build, host, target)]