Changes between Version 9 and Version 10 of Building/Architecture


Ignore:
Timestamp:
Mar 31, 2009 10:04:46 AM (6 years ago)
Author:
simonmar
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Building/Architecture

    v9 v10  
    7272first, and then get on to the specifics of how we build GHC.
    7373
    74 === Idiom: non-recursive make ===
    75 
    76 Build systems for large projects often use the technique commonly
    77 known as "recursive make", where there is a separate `Makefile` in
    78 each directory that is capable of building that part of the system.
    79 The `Makefile`s may share some common infrastructure and configuration
    80 by using GNU '''make''''s `include` directive; this is exactly what the
    81 previous GHC build system did.  However, this design has a number of
    82 flaws, as described in Peter Miller's
    83 [http://miller.emu.id.au/pmiller/books/rmch/ Recursive Make Considered Harmful]. 
    84 
    85 The GHC build system adopts the non-recursive '''make''' idiom.  That is, we
    86 never invoke '''make''' from inside a `Makefile`, and the whole build system
    87 is effectively a single giant `Makefile`.
    88 
    89 This gives us the following advantages:
    90 
    91  * Specifying dependencies between different parts of the tree is
    92    easy.  In this way, we can accurately specify many dependencies
    93    that we could not in the old recursive-make system.  This makes it much more likely that when you say "make"
    94    after modifying parts of the tree or pulling new patches,
    95    the build system will bring everything up-to-date in the correct order, and leave you with a working
    96    system.
    97 
    98  * More parallelism: dependencies are more fine-grained, and there
    99    is no need to build separate parts of the system in sequence, so
    100    the overall effect is that we have more parallelism in the build.
    101 
    102 Doesn't this sacrifice modularity?  No - we can still split the build
    103 system into separate files, using GNU '''make''''s `include`.
    104 
    105 Specific notes related to this idiom:
    106 
    107  * Individual directories usually have a `ghc.mk` file which
    108    contains the build instructions for that directory.
    109 
    110  * Other parts of the build system are in `mk/*.mk` and `rules/*.mk`.
    111 
    112  * The top-level `ghc.mk` file includes all the other `*.mk` files in
    113    the tree.  The top-level `Makefile` invokes '''make''' on `ghc.mk`
    114    (this is the only recursive invocation of '''make'''; see the "phase
    115    ordering" idiom below).
    116 
    117 === Idiom: stub makefiles ===
    118 
    119 It's all very well having a single giant `Makefile` that knows how to
    120 build everything in the right order, but sometimes you want to build
    121 just part of the system.  When working on GHC itself, we might want to
    122 build just the compiler, for example.  In the recursive '''make''' system we
    123 would do `cd ghc` and then `make`.  In the non-recursive system we can
    124 still achieve this by specifying the target with something like `make
    125 ghc/stage1/build/ghc`, but that's not so convenient.
    126 
    127 Our second idiom therefore supports the `cd ghc; make` idiom, just as
    128 with recursive make. To achieve this we put tiny stub `Makefile` in each
    129 directory whose job it is to invoke the main `Makefile` specifying the
    130 appropriate target(s) for that directory.  These stub `Makefiles`
    131 follow a simple pattern:
    132 
    133 {{{
    134 dir = libraries/base
    135 TOP = ../..
    136 include $(TOP)/mk/sub-makefile.mk
    137 }}}
    138 
    139 where `mk/sub-makefile.mk` knows how to recursively invoke the giant top-level '''make'''.
    140 
    141 === Idiom: standard targets (all, clean, etc.) ===
    142 
    143 We want an `all` target that builds everything, but we also want a way to build individual components (say, everything in `rts/`).  This is achieved by having a separate "all" target for each directory, named `all_`''directory''.  For example in `rts/ghc.mk` we might have this:
    144 
    145 {{{
    146 all : all_rts
    147 .PHONY all_rts
    148 all_rts : ...dependencies...
    149 }}}
    150 When the top level '''make''' includes all these `ghc.mk` files, it will see that target `all` depends on `all_rts, all_ghc, ...etc...`; so `make all` will make all of these.  But the individual targets are still available.  In particular, you can say
    151   * `make all_rts` (anywhere) to build everything in the RTS directory
    152   * `make all` (anywhere) to build everything
    153   * `make`, with no explicit target, makes the default target in the current directory's stub `Makefile`, which in turn makes the target `all_`''dir'', where ''dir'' is the current directory.
    154 
    155 Other standard targets such as `clean`, `install`, and so on use the same technique.  There are pre-canned macros to define your "all" and "clean" targets, take a look in `rules/all-target.mk` and `rules/clean-target.mk`.
    156 
    157 === Idiom: stages ===
    158 
    159 What do we use to compile GHC?  GHC itself, of course.  In a complete build we actually build GHC twice: once using the GHC version that is installed, and then again using the GHC we just built.  To be clear about which GHC we are talking about, we number them:
    160 
    161  * '''Stage 0''' is the GHC you have installed.  The "GHC you have installed" is also called "the bootstrap compiler".
    162  * '''Stage 1''' is the first GHC we build, using stage 0.  Stage 1 is then used to build the packages.
    163  * '''Stage 2''' is the second GHC we build, using stage 1.  This is the one we normally install when you say `make install`.
    164  * '''Stage 3''' is optional, but is sometimes built to test stage 2.
    165 
    166 Stage 1 does not support interactive execution (GHCi) and Template Haskell.  The reason being that when running byte code we must dynamically link the packages, and only in stage 2 and later can we guarantee that the packages we dynamically link are compatible with those that GHC was built against (because they are the very same packages).
    167 
    168 
    169 === Idiom: distdir ===
    170 
    171 Often we want to build a component multiple times in different ways.  For example:
    172 
    173  * certain libraries (e.g. Cabal) are required by GHC, so we build them once with the
    174    bootstrapping compiler, and again with stage 1 once that is built.
    175 
    176  * GHC itself is built multiple times (stage 1, stage 2, maybe stage 3)
    177 
    178  * some tools (e.g. ghc-pkg) are also built once with the bootstrapping compiler,
    179    and then again using stage 1 later.
    180 
    181 In order to support multiple builds in a directory, we place all generated files in a subdirectory, called the "distdir".  The distdir can be anything at all; for example in `compiler/` we name our distdirs after the stage (`stage1`, `stage2` etc.).  When there is only a single build in a directory, by convention we usually call the distdir simply "dist".
    182 
    183 There is a related concept called ''ways'', which includes profiling and dynamic-linking.  Multiple ways are currently part of the same "build" and use the same distdir, but in the future we might unify these concepts and give each way its own distdir.
    184 
    185 === Idiom: interaction with Cabal ===
    186 
    187 Many of the components of the GHC build system are also Cabal
    188 packages, with package metadata defined in a `foo.cabal` file. For the
    189 GHC build system we need to extract that metadata and use it to build
    190 the package. This is done by the program `ghc-cabal` (in `utils/ghc-cabal`
    191 in the GHC source tree). This program reads `foo.cabal` and produces
    192 `package-data.mk` containing the package metadata in the form of
    193 makefile bindings that we can use directly.
    194 
    195 We adhere to the following rule: '''`ghc-cabal` generates only
    196 makefile variable bindings''', such as
    197 {{{
    198   HS_SRCS = Foo.hs Bar.hs
    199 }}}
    200 `ghc-cabal` never generates makefile rules, macro, macro invocations etc.
    201 All the makefile code is therefore contained in fixed, editable
    202 `.mk` files.
    203 
    204 === Idiom: variable names ===
    205 
    206 Now that our build system is one giant `Makefile`, all our variables
    207 share the same namespace.  Where previously we might have had a
    208 variable that contained a list of the Haskell source files called
    209 `HS_SRCS`, now we have one of these for each directory (and indeed each build, or distdir) in the source tree,
    210 so we have to give them all different names.
    211 
    212 The idiom that we use for distinguishing variable names is to prepend
    213 the directory name and the distdir to the variable.  So for example the list of
    214 Haskell sources in the directory `utils/hsc2hs` would be in the
    215 variable `utils/hsc2hs_dist_HS_SRCS` ('''make''' doesn't mind slashes in variable
    216 names).  The pattern is: ''directory''_''distdir''_''variable''.
    217 
    218 === Idiom: macros ===
    219 The build system makes extensive use of Gnu '''make''' '''macros'''.  A macro is defined in
    220 GNU '''make''' using `define`, e.g.
    221 
    222 {{{
    223 define build-package
    224 # args: $1 = directory, $2 = distdir
    225 ... makefile code to build a package ...
    226 endef
    227 }}}
    228 
    229 (for example, see `rules/build-package`), and is invoked like this:
    230 
    231 
    232 {{{
    233 $(eval $(call build-package,libraries/base,dist))
    234 }}}
    235 
    236 (this invocation would be in `libraries/base/ghc.mk`).
    237 
    238 Note that `eval` works like this: its argument is expended as normal,
    239 and then the result is interpreted by '''make''' as makefile code.  This
    240 means the body of the `define` gets expanded ''twice''.  Typically
    241 this means we need to use `$$` instead of `$` everywhere in the body of
    242 `define`.
    243 
    244 Now, the `build-package` macro may need to define '''local variables'''.
    245 There is no support for local variables in macros, but we can define
    246 variables which are guaranteed to not clash with other variables by
    247 preceding their names with a string that is unique to this macro call.
    248 A convenient unique string to use is ''directory''_''distdir''_; this is unique as long as we only call each macro with a given directory/build pair once.  Most macros in
    249 the GHC build system take the directory and build as the first two
    250 arguments for exactly this reason.  For example, here's an excerpt
    251 from the `build-prog` macro:
    252 
    253 {{{
    254 define build-prog
    255 # $1 = dir
    256 # $2 = distdir
    257 # $3 = GHC stage to use (0 == bootstrapping compiler)
    258 
    259 $1_$2_INPLACE = $$(INPLACE_BIN)/$$($1_$2_PROG)
    260 ...
    261 }}}
    262 
    263 So if `build-prog` is called with `utils/hsc2hs` and `dist` for the
    264 first two arguments, after expansion '''make''' would see this:
    265 
    266 {{{
    267 utils/hsc2hs_dist_INPLACE = $(INPLACE_BIN)/$(utils/hsc2hs_dist_PROG)
    268 }}}
    269 
    270 The idiom of `$$($1_$2_VAR)` is very common throughout the build
    271 system - get used to reading it!  Note that the only time we use a
    272 single `$` in the body of `define` is to refer to the parameters `$1`,
    273 `$2`, and so on.
    274 
    275 === Idiom: phase ordering ===
    276 
    277 NB. you need to understand this section if either (a) you are modifying parts of the build system that include automatically-generated `Makefile` code, or (b) you need to understand why we have a top-level `Makefile` that recursively invokes '''make'''.
    278 
    279 The main hitch with non-recursive '''make''' arises when parts of the build
    280 system are automatically-generated.  The automatically-generated parts
    281 of our build system fall into two main categories:
    282 
    283  * Dependencies: we use `ghc -M` to generate make-dependencies for
    284    Haskell source files, and similarly `gcc -M` to do the same for
    285    C files.  The dependencies are normally generated into a file
    286    `.depend`, which is included as normal.
    287 
    288  * Makefile binding generated from `.cabal` package descriptions.  See
    289    "Idiom: interaction with Cabal".
    290 
    291 Now, we also want to be able to use `make` to build these files, since
    292 they have complex dependencies themselves.  For example, in order to build
    293 `package-data.mk` we need to first build `ghc-cabal` etc.; similarly,
    294 a `.depend` file needs to be re-generated if any of the source files have changed.
    295 
    296 GNU '''make''' has a clever strategy for handling this kind of scenario.  It
    297 first reads all the included Makefiles, and then tries to build each
    298 one if it is out-of-date, using the rules in the Makefiles themselves.
    299 When it has brought all the included Makefiles up-to-date, it restarts itself
    300 to read the newly-generated Makefiles.
    301 
    302 This works fine, unless there are dependencies ''between'' the
    303 Makefiles.  For example in the GHC build, the `.depend` file for a
    304 package cannot be generated until `package-data.mk` has been generated
    305 and '''make''' has been restarted to read in its contents, because it is the
    306 `package-data.mk` file that tells us which modules are in the package.
    307 But '''make''' always makes '''all''' the included `Makefiles` before restarting - it
    308 doesn't know how to restart itself earlier when there is a dependency
    309 between included `Makefiles`.
    310 
    311 Consider the following Makefile:
    312 
    313 {{{
    314 all :
    315 
    316 include inc1.mk
    317 
    318 inc1.mk : Makefile
    319         echo "X = C" >$@
    320 
    321 include inc2.mk
    322 
    323 inc2.mk : inc1.mk
    324         echo "Y = $(X)" >$@
    325 }}}
    326 
    327 Now try it:
    328 
    329 {{{
    330 $ make -f fail.mk
    331 fail.mk:3: inc1.mk: No such file or directory
    332 fail.mk:8: inc2.mk: No such file or directory
    333 echo "X = C" >inc1.mk
    334 echo "Y = " >inc2.mk
    335 make: Nothing to be done for `all'.
    336 }}}
    337 
    338 '''make''' built both `inc1.mk` and `inc2.mk` without restarting itself
    339 between the two (even though we added a dependency on `inc1.mk` from
    340 `inc2.mk`).
    341 
    342 The solution we adopt in the GHC build system is as follows.  We have
    343 two Makefiles, the first a wrapper around the second.
    344 
    345 {{{
    346 # top-level Makefile
    347 % :
    348         $(MAKE) -f inc.mk PHASE=0 just-makefiles
    349         $(MAKE) -f inc.mk $<
    350 }}}
    351 
    352 {{{
    353 # inc.mk
    354 
    355 include inc1.mk
    356 
    357 ifeq "$(PHASE)" "0"
    358 
    359 inc1.mk : inc.mk
    360         echo "X = C" >$@
    361 
    362 else
    363 
    364 include inc2.mk
    365 
    366 inc2.mk : inc1.mk
    367         echo "Y = $(X)" >$@
    368 
    369 endif
    370 
    371 just-makefiles:
    372         @: # do nothing
    373 
    374 clean :
    375         rm -f inc1.mk inc2.mk
    376 }}}
    377 Each time '''make''' is invoked, we recursively invoke '''make''' in several
    378 ''phases'':
    379  * '''Phase 0''': invoke `inc.mk` with `PHASE=0`.  This brings `inc1.mk`
    380    up-to-date (and ''only'' `inc1.mk`). 
    381 
    382  * '''Final phase''': invoke `inc.mk` again (with `PHASE` unset).  Now we can be sure
    383    that `inc1.mk` is up-to-date and proceed to generate `inc2.mk`. 
    384    If this changes `inc2.mk`, then '''make''' automatically re-invokes itself,
    385    repeating the final phase.
    386 We could instead have abandoned '''make''''s automatic re-invocation mechanism altogether,
    387 and used three explicit phases (0, 1, and final), but in practice it's very convenient to use the automatic
    388 re-invocation when there are no problematic dependencies.
    389 
    390 Note that the `inc1.mk` rule is ''only'' enabled in phase 0, so that if we accidentally call `inc.mk` without first performing phase 0, we will either get a failure (if `inc1.mk` doesn't exist), or otherwise '''make''' will not update `inc1.mk` if it is out-of-date.
    391 
    392 In the case of the GHC build system we need 4 such phases, see the
    393 comments in the top-level `ghc.mk` for details.
    394 
    395 This approach is not at all pretty, and
    396 re-invoking '''make''' every time is slow, but we don't know of a better
    397 workaround for this problem.
    398 
    399 
    400 
    401 
    402 === Idiom: no double-colon rules ===
    403 
    404 '''Make''' has a special type of rule of the form `target :: prerequisites`,
    405 with the behaviour that all double-colon rules for a given target are
    406 executed if the target needs to be rebuilt.  This style was popular
    407 for things like "all" and "clean" targets in the past, but it's not
    408 really necessary - see the "all" idiom above - and this means there's one fewer makeism you need to know about.
    409 
    410 === Idiom: the vanilla way ===
    411 
    412 Libraries can be built in several different "ways", for example
    413 "profiling" and "dynamic" are two ways.  Each way has a short tag
    414 associated with it; "p" and "dyn" are the tags for profiling and
    415 dynamic respectively.  In previous GHC build systems, the "normal" way
    416 didn't have a name, it was just always built.  Now we explicitly call
    417 it the "vanilla" way and use the tag "v" to refer to it. 
    418 
    419 This means that the `GhcLibWays` variable, which lists the ways in
    420 which the libraries are built, must include "v" if you want the
    421 vanilla way to be built (this is included in the default setup, of
    422 course).
    423 
    424 === Idiom: whitespace ===
    425 
    426 make has a rather ad-hoc approach to whitespace. Most of the time it ignores it, e.g.
    427 {{{
    428 FOO = bar
    429 }}}
    430 sets `FOO` to `"bar"`, not `" bar"`. However, sometimes whitespace is significant,
    431 and calling macros is one example. For example, we used to have a call
    432 {{{
    433 $(call all-target, $$($1_$2_INPLACE))
    434 }}}
    435 and this passed `" $$($1_$2_INPLACE)"` as the argument to `all-target`. This in turn generated
    436 {{{
    437 .PHONY: all_ inplace/bin/ghc-asm
    438 }}}
    439 which caused an infinite loop, as make continually thought that `ghc-asm` was out-of-date, rebuilt it,
    440 reinvoked make, and then thought it was out of date again.
    441 
    442 The moral of the story is, avoid white space unless you're sure it'll be OK!
    443 
    444 === Idiom: platform names ===
    445 
    446 There are three platforms of interest when building GHC:
    447 
    448  * `$(BUILDPLATFORM)`: The ''build'' platform.[[br]]
    449    The platform on which we are doing this build.
    450 
    451  * `$(HOSTPLATFORM)`: The ''host'' platform.[[br]]
    452    The platform on which these binaries will run.
    453 
    454  * `$(TARGETPLATFORM)`: The ''target'' platform.[[br]]
    455    The platform for which this compiler will generate code.
    456      
    457 These platforms are set when running the
    458 {{{configure}}} script, using the
    459 {{{--build}}}, {{{--host}}}, and
    460 {{{--target}}} options.  The {{{mk/project.mk}}}
    461 file, which is generated by `configure` from [http://darcs.haskell.org/mk/project.mk.in project.mk.in], defines several symbols related to the platform settings.
    462 
    463 We don't currently support build and host being different, because
    464 the build process creates binaries that are both run during the build,
    465 and also installed.
    466 
    467 If host and target are different, then we are building a
    468 cross-compiler.  For GHC, this means a compiler
    469 which will generate intermediate .hc files to port to the target
    470 architecture for bootstrapping.  The libraries and stage 2 compiler
    471 will be built as HC files for the target system (see [wiki:Building/Porting Porting GHC] for details).
    472 
    473 More details on when to use BUILD, HOST or TARGET can be found in
    474 the comments in [http://darcs.haskell.org/mk/project.mk.in project.mk.in].
     74 * [wiki:Building/Architecture/Idiom/NonRecursiveMake Non-recursive make]
     75 * [wiki:Building/Architecture/Idiom/StubMakefiles Stub makefiles]
     76 * [wiki:Building/Architecture/Idiom/StandardTargets Standard targets (all, clean etc.)]
     77 * [wiki:Building/Architecture/Idiom/Stages Stages]
     78 * [wiki:Building/Architecture/Idiom/Distdir Distdir]
     79 * [wiki:Building/Architecture/Idiom/Cabal Interaction with Cabal]
     80 * [wiki:Building/Architecture/Idiom/VariableNames Variable names]
     81 * [wiki:Building/Architecture/Idiom/Macros Macros]
     82 * [wiki:Building/Architecture/Idiom/PhaseOrdering Phase ordering]
     83 * [wiki:Building/Architecture/Idiom/DoubleColon No double-colon rules]
     84 * [wiki:Building/Architecture/Idiom/VanillaWay The vanilla way]
     85 * [wiki:Building/Architecture/Idiom/Whitespace Whitespace]
     86 * [wiki:Building/Architecture/Idiom/PlatformNames Platform names (build, host, target)]