Changes between Version 9 and Version 10 of Building/Architecture

Mar 31, 2009 10:04:46 AM (6 years ago)



  • Building/Architecture

    v9 v10  
    7272first, and then get on to the specifics of how we build GHC. 
    74 === Idiom: non-recursive make === 
    76 Build systems for large projects often use the technique commonly 
    77 known as "recursive make", where there is a separate `Makefile` in 
    78 each directory that is capable of building that part of the system. 
    79 The `Makefile`s may share some common infrastructure and configuration 
    80 by using GNU '''make''''s `include` directive; this is exactly what the 
    81 previous GHC build system did.  However, this design has a number of 
    82 flaws, as described in Peter Miller's 
    83 [ Recursive Make Considered Harmful].   
    85 The GHC build system adopts the non-recursive '''make''' idiom.  That is, we 
    86 never invoke '''make''' from inside a `Makefile`, and the whole build system 
    87 is effectively a single giant `Makefile`. 
    89 This gives us the following advantages: 
    91  * Specifying dependencies between different parts of the tree is 
    92    easy.  In this way, we can accurately specify many dependencies 
    93    that we could not in the old recursive-make system.  This makes it much more likely that when you say "make" 
    94    after modifying parts of the tree or pulling new patches, 
    95    the build system will bring everything up-to-date in the correct order, and leave you with a working 
    96    system. 
    98  * More parallelism: dependencies are more fine-grained, and there 
    99    is no need to build separate parts of the system in sequence, so 
    100    the overall effect is that we have more parallelism in the build. 
    102 Doesn't this sacrifice modularity?  No - we can still split the build 
    103 system into separate files, using GNU '''make''''s `include`. 
    105 Specific notes related to this idiom: 
    107  * Individual directories usually have a `` file which 
    108    contains the build instructions for that directory. 
    110  * Other parts of the build system are in `mk/*.mk` and `rules/*.mk`. 
    112  * The top-level `` file includes all the other `*.mk` files in 
    113    the tree.  The top-level `Makefile` invokes '''make''' on `` 
    114    (this is the only recursive invocation of '''make'''; see the "phase 
    115    ordering" idiom below). 
    117 === Idiom: stub makefiles === 
    119 It's all very well having a single giant `Makefile` that knows how to 
    120 build everything in the right order, but sometimes you want to build 
    121 just part of the system.  When working on GHC itself, we might want to 
    122 build just the compiler, for example.  In the recursive '''make''' system we 
    123 would do `cd ghc` and then `make`.  In the non-recursive system we can 
    124 still achieve this by specifying the target with something like `make 
    125 ghc/stage1/build/ghc`, but that's not so convenient. 
    127 Our second idiom therefore supports the `cd ghc; make` idiom, just as 
    128 with recursive make. To achieve this we put tiny stub `Makefile` in each 
    129 directory whose job it is to invoke the main `Makefile` specifying the 
    130 appropriate target(s) for that directory.  These stub `Makefiles` 
    131 follow a simple pattern: 
    133 {{{ 
    134 dir = libraries/base 
    135 TOP = ../.. 
    136 include $(TOP)/mk/ 
    137 }}} 
    139 where `mk/` knows how to recursively invoke the giant top-level '''make'''. 
    141 === Idiom: standard targets (all, clean, etc.) === 
    143 We want an `all` target that builds everything, but we also want a way to build individual components (say, everything in `rts/`).  This is achieved by having a separate "all" target for each directory, named `all_`''directory''.  For example in `rts/` we might have this: 
    145 {{{ 
    146 all : all_rts 
    147 .PHONY all_rts 
    148 all_rts : ...dependencies... 
    149 }}} 
    150 When the top level '''make''' includes all these `` files, it will see that target `all` depends on `all_rts, all_ghc, ...etc...`; so `make all` will make all of these.  But the individual targets are still available.  In particular, you can say 
    151   * `make all_rts` (anywhere) to build everything in the RTS directory 
    152   * `make all` (anywhere) to build everything 
    153   * `make`, with no explicit target, makes the default target in the current directory's stub `Makefile`, which in turn makes the target `all_`''dir'', where ''dir'' is the current directory. 
    155 Other standard targets such as `clean`, `install`, and so on use the same technique.  There are pre-canned macros to define your "all" and "clean" targets, take a look in `rules/` and `rules/`. 
    157 === Idiom: stages === 
    159 What do we use to compile GHC?  GHC itself, of course.  In a complete build we actually build GHC twice: once using the GHC version that is installed, and then again using the GHC we just built.  To be clear about which GHC we are talking about, we number them: 
    161  * '''Stage 0''' is the GHC you have installed.  The "GHC you have installed" is also called "the bootstrap compiler". 
    162  * '''Stage 1''' is the first GHC we build, using stage 0.  Stage 1 is then used to build the packages. 
    163  * '''Stage 2''' is the second GHC we build, using stage 1.  This is the one we normally install when you say `make install`. 
    164  * '''Stage 3''' is optional, but is sometimes built to test stage 2. 
    166 Stage 1 does not support interactive execution (GHCi) and Template Haskell.  The reason being that when running byte code we must dynamically link the packages, and only in stage 2 and later can we guarantee that the packages we dynamically link are compatible with those that GHC was built against (because they are the very same packages). 
    169 === Idiom: distdir === 
    171 Often we want to build a component multiple times in different ways.  For example: 
    173  * certain libraries (e.g. Cabal) are required by GHC, so we build them once with the 
    174    bootstrapping compiler, and again with stage 1 once that is built. 
    176  * GHC itself is built multiple times (stage 1, stage 2, maybe stage 3) 
    178  * some tools (e.g. ghc-pkg) are also built once with the bootstrapping compiler, 
    179    and then again using stage 1 later. 
    181 In order to support multiple builds in a directory, we place all generated files in a subdirectory, called the "distdir".  The distdir can be anything at all; for example in `compiler/` we name our distdirs after the stage (`stage1`, `stage2` etc.).  When there is only a single build in a directory, by convention we usually call the distdir simply "dist". 
    183 There is a related concept called ''ways'', which includes profiling and dynamic-linking.  Multiple ways are currently part of the same "build" and use the same distdir, but in the future we might unify these concepts and give each way its own distdir. 
    185 === Idiom: interaction with Cabal === 
    187 Many of the components of the GHC build system are also Cabal 
    188 packages, with package metadata defined in a `foo.cabal` file. For the 
    189 GHC build system we need to extract that metadata and use it to build 
    190 the package. This is done by the program `ghc-cabal` (in `utils/ghc-cabal` 
    191 in the GHC source tree). This program reads `foo.cabal` and produces 
    192 `` containing the package metadata in the form of 
    193 makefile bindings that we can use directly. 
    195 We adhere to the following rule: '''`ghc-cabal` generates only 
    196 makefile variable bindings''', such as 
    197 {{{ 
    198   HS_SRCS = Foo.hs Bar.hs 
    199 }}} 
    200 `ghc-cabal` never generates makefile rules, macro, macro invocations etc.  
    201 All the makefile code is therefore contained in fixed, editable  
    202 `.mk` files. 
    204 === Idiom: variable names === 
    206 Now that our build system is one giant `Makefile`, all our variables 
    207 share the same namespace.  Where previously we might have had a 
    208 variable that contained a list of the Haskell source files called 
    209 `HS_SRCS`, now we have one of these for each directory (and indeed each build, or distdir) in the source tree, 
    210 so we have to give them all different names. 
    212 The idiom that we use for distinguishing variable names is to prepend 
    213 the directory name and the distdir to the variable.  So for example the list of 
    214 Haskell sources in the directory `utils/hsc2hs` would be in the 
    215 variable `utils/hsc2hs_dist_HS_SRCS` ('''make''' doesn't mind slashes in variable 
    216 names).  The pattern is: ''directory''_''distdir''_''variable''. 
    218 === Idiom: macros === 
    219 The build system makes extensive use of Gnu '''make''' '''macros'''.  A macro is defined in 
    220 GNU '''make''' using `define`, e.g. 
    222 {{{ 
    223 define build-package 
    224 # args: $1 = directory, $2 = distdir 
    225 ... makefile code to build a package ... 
    226 endef 
    227 }}} 
    229 (for example, see `rules/build-package`), and is invoked like this: 
    232 {{{ 
    233 $(eval $(call build-package,libraries/base,dist)) 
    234 }}} 
    236 (this invocation would be in `libraries/base/`). 
    238 Note that `eval` works like this: its argument is expended as normal, 
    239 and then the result is interpreted by '''make''' as makefile code.  This 
    240 means the body of the `define` gets expanded ''twice''.  Typically 
    241 this means we need to use `$$` instead of `$` everywhere in the body of 
    242 `define`. 
    244 Now, the `build-package` macro may need to define '''local variables'''. 
    245 There is no support for local variables in macros, but we can define 
    246 variables which are guaranteed to not clash with other variables by 
    247 preceding their names with a string that is unique to this macro call. 
    248 A convenient unique string to use is ''directory''_''distdir''_; this is unique as long as we only call each macro with a given directory/build pair once.  Most macros in 
    249 the GHC build system take the directory and build as the first two 
    250 arguments for exactly this reason.  For example, here's an excerpt 
    251 from the `build-prog` macro: 
    253 {{{ 
    254 define build-prog 
    255 # $1 = dir 
    256 # $2 = distdir 
    257 # $3 = GHC stage to use (0 == bootstrapping compiler) 
    259 $1_$2_INPLACE = $$(INPLACE_BIN)/$$($1_$2_PROG) 
    260 ... 
    261 }}} 
    263 So if `build-prog` is called with `utils/hsc2hs` and `dist` for the 
    264 first two arguments, after expansion '''make''' would see this: 
    266 {{{ 
    267 utils/hsc2hs_dist_INPLACE = $(INPLACE_BIN)/$(utils/hsc2hs_dist_PROG) 
    268 }}} 
    270 The idiom of `$$($1_$2_VAR)` is very common throughout the build 
    271 system - get used to reading it!  Note that the only time we use a 
    272 single `$` in the body of `define` is to refer to the parameters `$1`, 
    273 `$2`, and so on. 
    275 === Idiom: phase ordering === 
    277 NB. you need to understand this section if either (a) you are modifying parts of the build system that include automatically-generated `Makefile` code, or (b) you need to understand why we have a top-level `Makefile` that recursively invokes '''make'''. 
    279 The main hitch with non-recursive '''make''' arises when parts of the build 
    280 system are automatically-generated.  The automatically-generated parts 
    281 of our build system fall into two main categories: 
    283  * Dependencies: we use `ghc -M` to generate make-dependencies for  
    284    Haskell source files, and similarly `gcc -M` to do the same for 
    285    C files.  The dependencies are normally generated into a file 
    286    `.depend`, which is included as normal. 
    288  * Makefile binding generated from `.cabal` package descriptions.  See 
    289    "Idiom: interaction with Cabal". 
    291 Now, we also want to be able to use `make` to build these files, since 
    292 they have complex dependencies themselves.  For example, in order to build 
    293 `` we need to first build `ghc-cabal` etc.; similarly, 
    294 a `.depend` file needs to be re-generated if any of the source files have changed. 
    296 GNU '''make''' has a clever strategy for handling this kind of scenario.  It 
    297 first reads all the included Makefiles, and then tries to build each 
    298 one if it is out-of-date, using the rules in the Makefiles themselves. 
    299 When it has brought all the included Makefiles up-to-date, it restarts itself 
    300 to read the newly-generated Makefiles. 
    302 This works fine, unless there are dependencies ''between'' the 
    303 Makefiles.  For example in the GHC build, the `.depend` file for a 
    304 package cannot be generated until `` has been generated 
    305 and '''make''' has been restarted to read in its contents, because it is the 
    306 `` file that tells us which modules are in the package. 
    307 But '''make''' always makes '''all''' the included `Makefiles` before restarting - it 
    308 doesn't know how to restart itself earlier when there is a dependency 
    309 between included `Makefiles`. 
    311 Consider the following Makefile: 
    313 {{{ 
    314 all : 
    316 include 
    318 : Makefile 
    319         echo "X = C" >$@ 
    321 include 
    323 : 
    324         echo "Y = $(X)" >$@ 
    325 }}} 
    327 Now try it: 
    329 {{{ 
    330 $ make -f 
    331 No such file or directory 
    332 No such file or directory 
    333 echo "X = C" > 
    334 echo "Y = " > 
    335 make: Nothing to be done for `all'. 
    336 }}} 
    338 '''make''' built both `` and `` without restarting itself 
    339 between the two (even though we added a dependency on `` from 
    340 ``). 
    342 The solution we adopt in the GHC build system is as follows.  We have 
    343 two Makefiles, the first a wrapper around the second. 
    345 {{{ 
    346 # top-level Makefile 
    347 % : 
    348         $(MAKE) -f PHASE=0 just-makefiles 
    349         $(MAKE) -f $< 
    350 }}} 
    352 {{{ 
    353 # 
    355 include 
    357 ifeq "$(PHASE)" "0" 
    359 : 
    360         echo "X = C" >$@ 
    362 else 
    364 include 
    366 : 
    367         echo "Y = $(X)" >$@ 
    369 endif 
    371 just-makefiles: 
    372         @: # do nothing 
    374 clean : 
    375         rm -f 
    376 }}} 
    377 Each time '''make''' is invoked, we recursively invoke '''make''' in several 
    378 ''phases'': 
    379  * '''Phase 0''': invoke `` with `PHASE=0`.  This brings ``  
    380    up-to-date (and ''only'' ``).   
    382  * '''Final phase''': invoke `` again (with `PHASE` unset).  Now we can be sure  
    383    that `` is up-to-date and proceed to generate ``.   
    384    If this changes ``, then '''make''' automatically re-invokes itself, 
    385    repeating the final phase. 
    386 We could instead have abandoned '''make''''s automatic re-invocation mechanism altogether, 
    387 and used three explicit phases (0, 1, and final), but in practice it's very convenient to use the automatic 
    388 re-invocation when there are no problematic dependencies. 
    390 Note that the `` rule is ''only'' enabled in phase 0, so that if we accidentally call `` without first performing phase 0, we will either get a failure (if `` doesn't exist), or otherwise '''make''' will not update `` if it is out-of-date. 
    392 In the case of the GHC build system we need 4 such phases, see the 
    393 comments in the top-level `` for details. 
    395 This approach is not at all pretty, and 
    396 re-invoking '''make''' every time is slow, but we don't know of a better 
    397 workaround for this problem. 
    402 === Idiom: no double-colon rules === 
    404 '''Make''' has a special type of rule of the form `target :: prerequisites`, 
    405 with the behaviour that all double-colon rules for a given target are 
    406 executed if the target needs to be rebuilt.  This style was popular 
    407 for things like "all" and "clean" targets in the past, but it's not 
    408 really necessary - see the "all" idiom above - and this means there's one fewer makeism you need to know about. 
    410 === Idiom: the vanilla way === 
    412 Libraries can be built in several different "ways", for example 
    413 "profiling" and "dynamic" are two ways.  Each way has a short tag 
    414 associated with it; "p" and "dyn" are the tags for profiling and 
    415 dynamic respectively.  In previous GHC build systems, the "normal" way 
    416 didn't have a name, it was just always built.  Now we explicitly call 
    417 it the "vanilla" way and use the tag "v" to refer to it.   
    419 This means that the `GhcLibWays` variable, which lists the ways in 
    420 which the libraries are built, must include "v" if you want the 
    421 vanilla way to be built (this is included in the default setup, of 
    422 course). 
    424 === Idiom: whitespace === 
    426 make has a rather ad-hoc approach to whitespace. Most of the time it ignores it, e.g. 
    427 {{{ 
    428 FOO = bar 
    429 }}} 
    430 sets `FOO` to `"bar"`, not `" bar"`. However, sometimes whitespace is significant, 
    431 and calling macros is one example. For example, we used to have a call 
    432 {{{ 
    433 $(call all-target, $$($1_$2_INPLACE)) 
    434 }}} 
    435 and this passed `" $$($1_$2_INPLACE)"` as the argument to `all-target`. This in turn generated 
    436 {{{ 
    437 .PHONY: all_ inplace/bin/ghc-asm 
    438 }}} 
    439 which caused an infinite loop, as make continually thought that `ghc-asm` was out-of-date, rebuilt it, 
    440 reinvoked make, and then thought it was out of date again. 
    442 The moral of the story is, avoid white space unless you're sure it'll be OK! 
    444 === Idiom: platform names === 
    446 There are three platforms of interest when building GHC: 
    448  * `$(BUILDPLATFORM)`: The ''build'' platform.[[br]] 
    449    The platform on which we are doing this build. 
    451  * `$(HOSTPLATFORM)`: The ''host'' platform.[[br]] 
    452    The platform on which these binaries will run. 
    454  * `$(TARGETPLATFORM)`: The ''target'' platform.[[br]] 
    455    The platform for which this compiler will generate code. 
    457 These platforms are set when running the 
    458 {{{configure}}} script, using the 
    459 {{{--build}}}, {{{--host}}}, and 
    460 {{{--target}}} options.  The {{{mk/}}} 
    461 file, which is generated by `configure` from [], defines several symbols related to the platform settings. 
    463 We don't currently support build and host being different, because 
    464 the build process creates binaries that are both run during the build, 
    465 and also installed. 
    467 If host and target are different, then we are building a 
    468 cross-compiler.  For GHC, this means a compiler 
    469 which will generate intermediate .hc files to port to the target 
    470 architecture for bootstrapping.  The libraries and stage 2 compiler 
    471 will be built as HC files for the target system (see [wiki:Building/Porting Porting GHC] for details). 
    473 More details on when to use BUILD, HOST or TARGET can be found in 
    474 the comments in []. 
     74 * [wiki:Building/Architecture/Idiom/NonRecursiveMake Non-recursive make] 
     75 * [wiki:Building/Architecture/Idiom/StubMakefiles Stub makefiles] 
     76 * [wiki:Building/Architecture/Idiom/StandardTargets Standard targets (all, clean etc.)] 
     77 * [wiki:Building/Architecture/Idiom/Stages Stages] 
     78 * [wiki:Building/Architecture/Idiom/Distdir Distdir] 
     79 * [wiki:Building/Architecture/Idiom/Cabal Interaction with Cabal] 
     80 * [wiki:Building/Architecture/Idiom/VariableNames Variable names] 
     81 * [wiki:Building/Architecture/Idiom/Macros Macros] 
     82 * [wiki:Building/Architecture/Idiom/PhaseOrdering Phase ordering] 
     83 * [wiki:Building/Architecture/Idiom/DoubleColon No double-colon rules] 
     84 * [wiki:Building/Architecture/Idiom/VanillaWay The vanilla way] 
     85 * [wiki:Building/Architecture/Idiom/Whitespace Whitespace] 
     86 * [wiki:Building/Architecture/Idiom/PlatformNames Platform names (build, host, target)]