Changes between Version 9 and Version 10 of Building/Architecture


Ignore:
Timestamp:
Mar 31, 2009 10:04:46 AM (5 years ago)
Author:
simonmar
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Building/Architecture

    v9 v10  
    7272first, and then get on to the specifics of how we build GHC. 
    7373 
    74 === Idiom: non-recursive make === 
    75  
    76 Build systems for large projects often use the technique commonly 
    77 known as "recursive make", where there is a separate `Makefile` in 
    78 each directory that is capable of building that part of the system. 
    79 The `Makefile`s may share some common infrastructure and configuration 
    80 by using GNU '''make''''s `include` directive; this is exactly what the 
    81 previous GHC build system did.  However, this design has a number of 
    82 flaws, as described in Peter Miller's 
    83 [http://miller.emu.id.au/pmiller/books/rmch/ Recursive Make Considered Harmful].   
    84  
    85 The GHC build system adopts the non-recursive '''make''' idiom.  That is, we 
    86 never invoke '''make''' from inside a `Makefile`, and the whole build system 
    87 is effectively a single giant `Makefile`. 
    88  
    89 This gives us the following advantages: 
    90  
    91  * Specifying dependencies between different parts of the tree is 
    92    easy.  In this way, we can accurately specify many dependencies 
    93    that we could not in the old recursive-make system.  This makes it much more likely that when you say "make" 
    94    after modifying parts of the tree or pulling new patches, 
    95    the build system will bring everything up-to-date in the correct order, and leave you with a working 
    96    system. 
    97  
    98  * More parallelism: dependencies are more fine-grained, and there 
    99    is no need to build separate parts of the system in sequence, so 
    100    the overall effect is that we have more parallelism in the build. 
    101  
    102 Doesn't this sacrifice modularity?  No - we can still split the build 
    103 system into separate files, using GNU '''make''''s `include`. 
    104  
    105 Specific notes related to this idiom: 
    106  
    107  * Individual directories usually have a `ghc.mk` file which 
    108    contains the build instructions for that directory. 
    109  
    110  * Other parts of the build system are in `mk/*.mk` and `rules/*.mk`. 
    111  
    112  * The top-level `ghc.mk` file includes all the other `*.mk` files in 
    113    the tree.  The top-level `Makefile` invokes '''make''' on `ghc.mk` 
    114    (this is the only recursive invocation of '''make'''; see the "phase 
    115    ordering" idiom below). 
    116  
    117 === Idiom: stub makefiles === 
    118  
    119 It's all very well having a single giant `Makefile` that knows how to 
    120 build everything in the right order, but sometimes you want to build 
    121 just part of the system.  When working on GHC itself, we might want to 
    122 build just the compiler, for example.  In the recursive '''make''' system we 
    123 would do `cd ghc` and then `make`.  In the non-recursive system we can 
    124 still achieve this by specifying the target with something like `make 
    125 ghc/stage1/build/ghc`, but that's not so convenient. 
    126  
    127 Our second idiom therefore supports the `cd ghc; make` idiom, just as 
    128 with recursive make. To achieve this we put tiny stub `Makefile` in each 
    129 directory whose job it is to invoke the main `Makefile` specifying the 
    130 appropriate target(s) for that directory.  These stub `Makefiles` 
    131 follow a simple pattern: 
    132  
    133 {{{ 
    134 dir = libraries/base 
    135 TOP = ../.. 
    136 include $(TOP)/mk/sub-makefile.mk 
    137 }}} 
    138  
    139 where `mk/sub-makefile.mk` knows how to recursively invoke the giant top-level '''make'''. 
    140  
    141 === Idiom: standard targets (all, clean, etc.) === 
    142  
    143 We want an `all` target that builds everything, but we also want a way to build individual components (say, everything in `rts/`).  This is achieved by having a separate "all" target for each directory, named `all_`''directory''.  For example in `rts/ghc.mk` we might have this: 
    144  
    145 {{{ 
    146 all : all_rts 
    147 .PHONY all_rts 
    148 all_rts : ...dependencies... 
    149 }}} 
    150 When the top level '''make''' includes all these `ghc.mk` files, it will see that target `all` depends on `all_rts, all_ghc, ...etc...`; so `make all` will make all of these.  But the individual targets are still available.  In particular, you can say 
    151   * `make all_rts` (anywhere) to build everything in the RTS directory 
    152   * `make all` (anywhere) to build everything 
    153   * `make`, with no explicit target, makes the default target in the current directory's stub `Makefile`, which in turn makes the target `all_`''dir'', where ''dir'' is the current directory. 
    154  
    155 Other standard targets such as `clean`, `install`, and so on use the same technique.  There are pre-canned macros to define your "all" and "clean" targets, take a look in `rules/all-target.mk` and `rules/clean-target.mk`. 
    156  
    157 === Idiom: stages === 
    158  
    159 What do we use to compile GHC?  GHC itself, of course.  In a complete build we actually build GHC twice: once using the GHC version that is installed, and then again using the GHC we just built.  To be clear about which GHC we are talking about, we number them: 
    160  
    161  * '''Stage 0''' is the GHC you have installed.  The "GHC you have installed" is also called "the bootstrap compiler". 
    162  * '''Stage 1''' is the first GHC we build, using stage 0.  Stage 1 is then used to build the packages. 
    163  * '''Stage 2''' is the second GHC we build, using stage 1.  This is the one we normally install when you say `make install`. 
    164  * '''Stage 3''' is optional, but is sometimes built to test stage 2. 
    165  
    166 Stage 1 does not support interactive execution (GHCi) and Template Haskell.  The reason being that when running byte code we must dynamically link the packages, and only in stage 2 and later can we guarantee that the packages we dynamically link are compatible with those that GHC was built against (because they are the very same packages). 
    167  
    168  
    169 === Idiom: distdir === 
    170  
    171 Often we want to build a component multiple times in different ways.  For example: 
    172  
    173  * certain libraries (e.g. Cabal) are required by GHC, so we build them once with the 
    174    bootstrapping compiler, and again with stage 1 once that is built. 
    175  
    176  * GHC itself is built multiple times (stage 1, stage 2, maybe stage 3) 
    177  
    178  * some tools (e.g. ghc-pkg) are also built once with the bootstrapping compiler, 
    179    and then again using stage 1 later. 
    180  
    181 In order to support multiple builds in a directory, we place all generated files in a subdirectory, called the "distdir".  The distdir can be anything at all; for example in `compiler/` we name our distdirs after the stage (`stage1`, `stage2` etc.).  When there is only a single build in a directory, by convention we usually call the distdir simply "dist". 
    182  
    183 There is a related concept called ''ways'', which includes profiling and dynamic-linking.  Multiple ways are currently part of the same "build" and use the same distdir, but in the future we might unify these concepts and give each way its own distdir. 
    184  
    185 === Idiom: interaction with Cabal === 
    186  
    187 Many of the components of the GHC build system are also Cabal 
    188 packages, with package metadata defined in a `foo.cabal` file. For the 
    189 GHC build system we need to extract that metadata and use it to build 
    190 the package. This is done by the program `ghc-cabal` (in `utils/ghc-cabal` 
    191 in the GHC source tree). This program reads `foo.cabal` and produces 
    192 `package-data.mk` containing the package metadata in the form of 
    193 makefile bindings that we can use directly. 
    194  
    195 We adhere to the following rule: '''`ghc-cabal` generates only 
    196 makefile variable bindings''', such as 
    197 {{{ 
    198   HS_SRCS = Foo.hs Bar.hs 
    199 }}} 
    200 `ghc-cabal` never generates makefile rules, macro, macro invocations etc.  
    201 All the makefile code is therefore contained in fixed, editable  
    202 `.mk` files. 
    203  
    204 === Idiom: variable names === 
    205  
    206 Now that our build system is one giant `Makefile`, all our variables 
    207 share the same namespace.  Where previously we might have had a 
    208 variable that contained a list of the Haskell source files called 
    209 `HS_SRCS`, now we have one of these for each directory (and indeed each build, or distdir) in the source tree, 
    210 so we have to give them all different names. 
    211  
    212 The idiom that we use for distinguishing variable names is to prepend 
    213 the directory name and the distdir to the variable.  So for example the list of 
    214 Haskell sources in the directory `utils/hsc2hs` would be in the 
    215 variable `utils/hsc2hs_dist_HS_SRCS` ('''make''' doesn't mind slashes in variable 
    216 names).  The pattern is: ''directory''_''distdir''_''variable''. 
    217  
    218 === Idiom: macros === 
    219 The build system makes extensive use of Gnu '''make''' '''macros'''.  A macro is defined in 
    220 GNU '''make''' using `define`, e.g. 
    221  
    222 {{{ 
    223 define build-package 
    224 # args: $1 = directory, $2 = distdir 
    225 ... makefile code to build a package ... 
    226 endef 
    227 }}} 
    228  
    229 (for example, see `rules/build-package`), and is invoked like this: 
    230  
    231  
    232 {{{ 
    233 $(eval $(call build-package,libraries/base,dist)) 
    234 }}} 
    235  
    236 (this invocation would be in `libraries/base/ghc.mk`). 
    237  
    238 Note that `eval` works like this: its argument is expended as normal, 
    239 and then the result is interpreted by '''make''' as makefile code.  This 
    240 means the body of the `define` gets expanded ''twice''.  Typically 
    241 this means we need to use `$$` instead of `$` everywhere in the body of 
    242 `define`. 
    243  
    244 Now, the `build-package` macro may need to define '''local variables'''. 
    245 There is no support for local variables in macros, but we can define 
    246 variables which are guaranteed to not clash with other variables by 
    247 preceding their names with a string that is unique to this macro call. 
    248 A convenient unique string to use is ''directory''_''distdir''_; this is unique as long as we only call each macro with a given directory/build pair once.  Most macros in 
    249 the GHC build system take the directory and build as the first two 
    250 arguments for exactly this reason.  For example, here's an excerpt 
    251 from the `build-prog` macro: 
    252  
    253 {{{ 
    254 define build-prog 
    255 # $1 = dir 
    256 # $2 = distdir 
    257 # $3 = GHC stage to use (0 == bootstrapping compiler) 
    258  
    259 $1_$2_INPLACE = $$(INPLACE_BIN)/$$($1_$2_PROG) 
    260 ... 
    261 }}} 
    262  
    263 So if `build-prog` is called with `utils/hsc2hs` and `dist` for the 
    264 first two arguments, after expansion '''make''' would see this: 
    265  
    266 {{{ 
    267 utils/hsc2hs_dist_INPLACE = $(INPLACE_BIN)/$(utils/hsc2hs_dist_PROG) 
    268 }}} 
    269  
    270 The idiom of `$$($1_$2_VAR)` is very common throughout the build 
    271 system - get used to reading it!  Note that the only time we use a 
    272 single `$` in the body of `define` is to refer to the parameters `$1`, 
    273 `$2`, and so on. 
    274  
    275 === Idiom: phase ordering === 
    276  
    277 NB. you need to understand this section if either (a) you are modifying parts of the build system that include automatically-generated `Makefile` code, or (b) you need to understand why we have a top-level `Makefile` that recursively invokes '''make'''. 
    278  
    279 The main hitch with non-recursive '''make''' arises when parts of the build 
    280 system are automatically-generated.  The automatically-generated parts 
    281 of our build system fall into two main categories: 
    282  
    283  * Dependencies: we use `ghc -M` to generate make-dependencies for  
    284    Haskell source files, and similarly `gcc -M` to do the same for 
    285    C files.  The dependencies are normally generated into a file 
    286    `.depend`, which is included as normal. 
    287  
    288  * Makefile binding generated from `.cabal` package descriptions.  See 
    289    "Idiom: interaction with Cabal". 
    290  
    291 Now, we also want to be able to use `make` to build these files, since 
    292 they have complex dependencies themselves.  For example, in order to build 
    293 `package-data.mk` we need to first build `ghc-cabal` etc.; similarly, 
    294 a `.depend` file needs to be re-generated if any of the source files have changed. 
    295  
    296 GNU '''make''' has a clever strategy for handling this kind of scenario.  It 
    297 first reads all the included Makefiles, and then tries to build each 
    298 one if it is out-of-date, using the rules in the Makefiles themselves. 
    299 When it has brought all the included Makefiles up-to-date, it restarts itself 
    300 to read the newly-generated Makefiles. 
    301  
    302 This works fine, unless there are dependencies ''between'' the 
    303 Makefiles.  For example in the GHC build, the `.depend` file for a 
    304 package cannot be generated until `package-data.mk` has been generated 
    305 and '''make''' has been restarted to read in its contents, because it is the 
    306 `package-data.mk` file that tells us which modules are in the package. 
    307 But '''make''' always makes '''all''' the included `Makefiles` before restarting - it 
    308 doesn't know how to restart itself earlier when there is a dependency 
    309 between included `Makefiles`. 
    310  
    311 Consider the following Makefile: 
    312  
    313 {{{ 
    314 all : 
    315  
    316 include inc1.mk 
    317  
    318 inc1.mk : Makefile 
    319         echo "X = C" >$@ 
    320  
    321 include inc2.mk 
    322  
    323 inc2.mk : inc1.mk 
    324         echo "Y = $(X)" >$@ 
    325 }}} 
    326  
    327 Now try it: 
    328  
    329 {{{ 
    330 $ make -f fail.mk 
    331 fail.mk:3: inc1.mk: No such file or directory 
    332 fail.mk:8: inc2.mk: No such file or directory 
    333 echo "X = C" >inc1.mk 
    334 echo "Y = " >inc2.mk 
    335 make: Nothing to be done for `all'. 
    336 }}} 
    337  
    338 '''make''' built both `inc1.mk` and `inc2.mk` without restarting itself 
    339 between the two (even though we added a dependency on `inc1.mk` from 
    340 `inc2.mk`). 
    341  
    342 The solution we adopt in the GHC build system is as follows.  We have 
    343 two Makefiles, the first a wrapper around the second. 
    344  
    345 {{{ 
    346 # top-level Makefile 
    347 % : 
    348         $(MAKE) -f inc.mk PHASE=0 just-makefiles 
    349         $(MAKE) -f inc.mk $< 
    350 }}} 
    351  
    352 {{{ 
    353 # inc.mk 
    354  
    355 include inc1.mk 
    356  
    357 ifeq "$(PHASE)" "0" 
    358  
    359 inc1.mk : inc.mk 
    360         echo "X = C" >$@ 
    361  
    362 else 
    363  
    364 include inc2.mk 
    365  
    366 inc2.mk : inc1.mk 
    367         echo "Y = $(X)" >$@ 
    368  
    369 endif 
    370  
    371 just-makefiles: 
    372         @: # do nothing 
    373  
    374 clean : 
    375         rm -f inc1.mk inc2.mk 
    376 }}} 
    377 Each time '''make''' is invoked, we recursively invoke '''make''' in several 
    378 ''phases'': 
    379  * '''Phase 0''': invoke `inc.mk` with `PHASE=0`.  This brings `inc1.mk`  
    380    up-to-date (and ''only'' `inc1.mk`).   
    381  
    382  * '''Final phase''': invoke `inc.mk` again (with `PHASE` unset).  Now we can be sure  
    383    that `inc1.mk` is up-to-date and proceed to generate `inc2.mk`.   
    384    If this changes `inc2.mk`, then '''make''' automatically re-invokes itself, 
    385    repeating the final phase. 
    386 We could instead have abandoned '''make''''s automatic re-invocation mechanism altogether, 
    387 and used three explicit phases (0, 1, and final), but in practice it's very convenient to use the automatic 
    388 re-invocation when there are no problematic dependencies. 
    389  
    390 Note that the `inc1.mk` rule is ''only'' enabled in phase 0, so that if we accidentally call `inc.mk` without first performing phase 0, we will either get a failure (if `inc1.mk` doesn't exist), or otherwise '''make''' will not update `inc1.mk` if it is out-of-date. 
    391  
    392 In the case of the GHC build system we need 4 such phases, see the 
    393 comments in the top-level `ghc.mk` for details. 
    394  
    395 This approach is not at all pretty, and 
    396 re-invoking '''make''' every time is slow, but we don't know of a better 
    397 workaround for this problem. 
    398  
    399  
    400  
    401  
    402 === Idiom: no double-colon rules === 
    403  
    404 '''Make''' has a special type of rule of the form `target :: prerequisites`, 
    405 with the behaviour that all double-colon rules for a given target are 
    406 executed if the target needs to be rebuilt.  This style was popular 
    407 for things like "all" and "clean" targets in the past, but it's not 
    408 really necessary - see the "all" idiom above - and this means there's one fewer makeism you need to know about. 
    409  
    410 === Idiom: the vanilla way === 
    411  
    412 Libraries can be built in several different "ways", for example 
    413 "profiling" and "dynamic" are two ways.  Each way has a short tag 
    414 associated with it; "p" and "dyn" are the tags for profiling and 
    415 dynamic respectively.  In previous GHC build systems, the "normal" way 
    416 didn't have a name, it was just always built.  Now we explicitly call 
    417 it the "vanilla" way and use the tag "v" to refer to it.   
    418  
    419 This means that the `GhcLibWays` variable, which lists the ways in 
    420 which the libraries are built, must include "v" if you want the 
    421 vanilla way to be built (this is included in the default setup, of 
    422 course). 
    423  
    424 === Idiom: whitespace === 
    425  
    426 make has a rather ad-hoc approach to whitespace. Most of the time it ignores it, e.g. 
    427 {{{ 
    428 FOO = bar 
    429 }}} 
    430 sets `FOO` to `"bar"`, not `" bar"`. However, sometimes whitespace is significant, 
    431 and calling macros is one example. For example, we used to have a call 
    432 {{{ 
    433 $(call all-target, $$($1_$2_INPLACE)) 
    434 }}} 
    435 and this passed `" $$($1_$2_INPLACE)"` as the argument to `all-target`. This in turn generated 
    436 {{{ 
    437 .PHONY: all_ inplace/bin/ghc-asm 
    438 }}} 
    439 which caused an infinite loop, as make continually thought that `ghc-asm` was out-of-date, rebuilt it, 
    440 reinvoked make, and then thought it was out of date again. 
    441  
    442 The moral of the story is, avoid white space unless you're sure it'll be OK! 
    443  
    444 === Idiom: platform names === 
    445  
    446 There are three platforms of interest when building GHC: 
    447  
    448  * `$(BUILDPLATFORM)`: The ''build'' platform.[[br]] 
    449    The platform on which we are doing this build. 
    450  
    451  * `$(HOSTPLATFORM)`: The ''host'' platform.[[br]] 
    452    The platform on which these binaries will run. 
    453  
    454  * `$(TARGETPLATFORM)`: The ''target'' platform.[[br]] 
    455    The platform for which this compiler will generate code. 
    456        
    457 These platforms are set when running the 
    458 {{{configure}}} script, using the 
    459 {{{--build}}}, {{{--host}}}, and 
    460 {{{--target}}} options.  The {{{mk/project.mk}}} 
    461 file, which is generated by `configure` from [http://darcs.haskell.org/mk/project.mk.in project.mk.in], defines several symbols related to the platform settings. 
    462  
    463 We don't currently support build and host being different, because 
    464 the build process creates binaries that are both run during the build, 
    465 and also installed. 
    466  
    467 If host and target are different, then we are building a 
    468 cross-compiler.  For GHC, this means a compiler 
    469 which will generate intermediate .hc files to port to the target 
    470 architecture for bootstrapping.  The libraries and stage 2 compiler 
    471 will be built as HC files for the target system (see [wiki:Building/Porting Porting GHC] for details). 
    472  
    473 More details on when to use BUILD, HOST or TARGET can be found in 
    474 the comments in [http://darcs.haskell.org/mk/project.mk.in project.mk.in]. 
     74 * [wiki:Building/Architecture/Idiom/NonRecursiveMake Non-recursive make] 
     75 * [wiki:Building/Architecture/Idiom/StubMakefiles Stub makefiles] 
     76 * [wiki:Building/Architecture/Idiom/StandardTargets Standard targets (all, clean etc.)] 
     77 * [wiki:Building/Architecture/Idiom/Stages Stages] 
     78 * [wiki:Building/Architecture/Idiom/Distdir Distdir] 
     79 * [wiki:Building/Architecture/Idiom/Cabal Interaction with Cabal] 
     80 * [wiki:Building/Architecture/Idiom/VariableNames Variable names] 
     81 * [wiki:Building/Architecture/Idiom/Macros Macros] 
     82 * [wiki:Building/Architecture/Idiom/PhaseOrdering Phase ordering] 
     83 * [wiki:Building/Architecture/Idiom/DoubleColon No double-colon rules] 
     84 * [wiki:Building/Architecture/Idiom/VanillaWay The vanilla way] 
     85 * [wiki:Building/Architecture/Idiom/Whitespace Whitespace] 
     86 * [wiki:Building/Architecture/Idiom/PlatformNames Platform names (build, host, target)]