Changes between Version 33 and Version 34 of Building/RunningTests


Ignore:
Timestamp:
Jul 8, 2011 10:23:58 PM (3 years ago)
Author:
dterei
Comment:

Rewrite of testsuite page

Legend:

Unmodified
Added
Removed
Modified
  • Building/RunningTests

    v33 v34  
    1 [[PageOutline]] 
    21= GHC Test framework = 
    32 
    4 NOTE: you need GNU make and Python (any version >= 1.5 will probably do) in order 
    5 to use the testsuite. If you want to run the testsuite in parallel then you need Python 2.5.2 or later. 
    6 (Avoid Python 2.6.1 as the testsuite tickles a bug in one of the included libraries) 
     3GHC includes a comprehensive testsuite for catching any regressions. 
    74 
    8 If you have not checked out the test suite, first run: 
     5The testsuite relies primarily on '''GNU Make''' and '''Python'''. Any version >= 2.5.2 will do although avoid Python 2.6.1 as the testsuite tickles a bug in one of the included libraries. 
     6 
     7If you have not checked out the testsuite, first run: 
    98{{{ 
    10 ./sync-all --testsuite get 
     9$ ./sync-all --testsuite get 
    1110}}} 
    1211 
    13 If you just want to run the whole test suite, then in the root of the tree running 
     12If you just want to run the whole testsuite, then in the root of the GHC tree, typing: 
    1413{{{ 
    15 make test 
     14$ make test 
    1615}}} 
    1716will do a run in "fast" mode (which gives an idea whether there are major problems), or 
    1817{{{ 
    19 make fulltest 
     18$ make fulltest 
    2019}}} 
    21 will do a full testsuite run (more thorough, but takes a lot longer). 
     20will do a full testsuite run (more thorough, but takes a lot longer). You should expect that there are no test case failures for the "fast" mode as as that is a quality level that all GHC developers are expected to maintain when they check in code. There will usually be some test case failures for the full testsuite run though. 
    2221 
    23 Below we will explain how to get finer control of the test suite. 
     22== Using the Testsuite == 
    2423 
    25 == Detail == 
     24 * [wiki:Building/RunningTests/Running Running the testsuite] 
     25 * [wiki:Building/RunningTests/Settings Testsuite Settings and WAYS] 
     26 * [wiki:Building/RunningTests/Updating Updating test case results] 
     27 * [wiki:Building/RunningTests/Adding Adding new test cases] 
    2628 
    27 To run the test suite against a GHC build in the same source tree: 
    28 {{{ 
    29         cd testsuite/tests/ghc-regress 
    30         make 
    31 }}} 
    32 (from now on, we'll assume that you're in the tests/ghc-regress 
    33 directory). 
     29== Problems running the testsuite == 
    3430 
    35 To run a fast version of the testsuite, which should complete in under 
    36 5 minutes on a fast machine with an optimised GHC build: 
    37 {{{ 
    38         make fast 
    39 }}} 
    40 By default the testsuite uses the stage2 compiler. If you want to use another stage 
    41 (e.g. because your stage2 compiler doesn't work) then: 
    42 {{{ 
    43         make stage=1 
    44 }}} 
    45 To run the test suite against a different GHC, say ghc-5.04: 
    46 {{{ 
    47         make TEST_HC=ghc-5.04 
    48 }}} 
    49 To run an individual test or tests (eg. tc054): 
    50 {{{ 
    51         make TEST=tc054 
    52 }}} 
    53 (you can also go straight to the directory containing the test and say 
    54 'make TEST=tc054' from there, which will save some time). 
     31 1. If the testsuite fails mysteriously, make sure that the {{{timeout}}} utility is working properly. This Haskell utility is compiled with the stage 1 compiler and invoked by the python driver, which does not print a nice error report if the utility fails. This can happen if, for example, the compiler produces bogus binaries. A workaround is to compile {{{timeout}}} with a stable {{{ghc}}}. 
    5532 
    56 To run several tests, you just space separate them: 
    57 {{{ 
    58         make TEST="tc054 tc053" 
    59 }}} 
    60  
    61 To run the tests one particular way only (eg. GHCi): 
    62 {{{ 
    63         make WAY=ghci 
    64 }}} 
    65 To add specific options to the compiler: 
    66 {{{ 
    67         make EXTRA_HC_OPTS='+RTS -K32M -RTS'  
    68 }}} 
    69  
    70 To save disk space you can have temporary files deleted after each test: 
    71 {{{ 
    72         make CLEANUP=1 
    73 }}} 
    74  
    75 If you have python 2.5.2 or later then you can run the testsuite in parallel: 
    76 {{{ 
    77         make THREADS=2 
    78 }}} 
    79  
    80 For more details, see below. 
    81  
    82 = Running the testsuite with a compiler other than GHC = 
    83  
    84 This doesn't work at the moment, but if it did then it would probably involve something like: 
    85 {{{ 
    86         cd testsuite 
    87         make TEST_HC=nhc98 COMPILER=nhc98 
    88 }}} 
    89  
    90 = Running individual tests or subdirectories of the testsuite = 
    91  
    92 Most of the subdirectories in the testsuite have a Makefile.  In these 
    93 subdirectories you can use 'make' to run the test driver in two 
    94 ways: 
    95 {{{ 
    96         make            -- run all the tests in the current directory 
    97         make accept     -- run the tests, accepting the current output 
    98 }}} 
    99 The following variables may be set on the make command line: 
    100 {{{ 
    101         TESTS                   -- specific tests to run 
    102         TEST_HC                 -- compiler to use 
    103         EXTRA_HC_OPTS           -- extra flags to send to the Haskell compiler 
    104         EXTRA_RUNTEST_OPTS      -- extra flags to give the test driver 
    105         CONFIG                  -- use a different configuration file 
    106         COMPILER                -- stem of a different configuration file 
    107                                 -- from the config directory [default: ghc] 
    108         WAY                     -- just this way 
    109 }}} 
    110 The following ways are defined (for GHC, see the file config/ghc for the complete list): 
    111 {{{ 
    112         normal                  -- no special options 
    113         llvm                    -- -fllvm 
    114         optc                    -- -O -fvia-C 
    115         optasm                  -- -O -fasm 
    116         optllvm                 -- -O -fllvm 
    117         profc                   -- -O -prof -auto-all -fvia-C 
    118         profasm                 -- -O -prof -auto-all -fasm 
    119         ghci                    -- (run only, not compile) run test under GHCi 
    120         extcore                 -- -fext-core 
    121         optextcore              -- -O -fext-core 
    122         threaded1               -- -threaded -debug 
    123         threaded2               -- -threaded -O, and +RTS -N2 at run-time 
    124         hpc                     -- -fhpc 
    125         dyn                     -- -O -dynamic 
    126 }}} 
    127 certain ways are enabled automatically if the GHC build in the local 
    128 tree supports them.  Ways that are enabled this way are optasm, profc, 
    129 profasm, threaded1, threaded2, and ghci. 
    130  
    131 = Updating tests when the output changes = 
    132  
    133 If the output of a test has changed, but the new output is still 
    134 correct, you can automatically update the sample output to match the 
    135 new output like so: 
    136 {{{ 
    137         make accept TEST=<test-name> 
    138 }}} 
    139 where <test-name> is the name of the test.  In a directory which 
    140 contains a single test, or if you want to update *all* the tests in 
    141 the current directory, just omit the 'TEST=<test-name>' part. 
    142  
    143 = Adding a new test = 
    144  
    145 For a test which can be encapsulated in a single source file, follow 
    146 these steps: 
    147  
    148  1. Find the appropriate place for the test.  The GHC regression suite 
    149     is generally organised in a "white-box" manner: a regression which 
    150     originally illustrated a bug in a particular part of the compiler 
    151     is placed in the directory for that part.  For example, typechecker 
    152     regression tests go in the typechecker/ directory, parser tests 
    153     go in parser/, and so on.   
    154  
    155  It's not always possible to find a single best place for a test; 
    156  in those cases just pick one which seems reasonable. 
    157  
    158  Under each main directory may be up to three subdirectories: 
    159        '''should_compile''':     
    160            tests which need to compile only 
    161        '''should_fail''':     
    162            tests which should fail to compile and generate a particular error message 
    163        '''should_run''': 
    164            tests which should compile, run with some specific input, and generate a particular output. 
    165      
    166  We don't always divide the tests up like this, and it's not 
    167  essential to do so (the directory names have no meaning as 
    168  far as the test driver is concerned).         
    169  
    170  
    171  2. Having found a suitable place for the test, give the test a name. 
    172     For regression tests, we often just name the test after the bug number (e.g. T2047). 
    173     Alternatively, follow the convention for the directory in which you place the 
    174     test: for example, in typecheck/should_compile, tests are named 
    175     tc001, tc002, and so on.  Suppose you name your test T, then 
    176     you'll have the following files: 
    177  
    178       T.hs 
    179         The source file containing the test 
    180  
    181       T.stdin   (for tests that run, and optional) 
    182         A file to feed the test as standard input when it 
    183         runs. 
    184  
    185       T.stdout  (for tests that run, and optional) 
    186         For tests that run, this file is compared against 
    187         the standard output generated by the program.  If  
    188         T.stdout does not exist, then the program must not 
    189         generate anything on stdout. 
    190  
    191       T.stderr  (optional) 
    192         For tests that run, this file is compared 
    193         against the standard error generated by the program. 
    194  
    195         For tests that compile only, this file is compared 
    196         against the standard error output of the compiler, 
    197         which is normalised to eliminate bogus differences 
    198         (eg. absolute pathnames are removed, whitespace 
    199         differences are ignored, etc.) 
    200  
    201  
    202  3. Edit all.T in the relevant directory and add a line for the test.  The line is always of the form 
    203 {{{ 
    204       test(<name>, <setup>, <test-fn>, <args>) 
    205 }}} 
    206  The format of these fields is described in the [wiki:Building/RunningTests#Formatofthetestentries next section]. 
    207  
    208   
    209  
    210 A multi-module test is straightforward.  It usually goes in a 
    211 directory of its own (although this isn't essential), and the source 
    212 files can be named anything you like.  The test must have a name, in 
    213 the same way as a single-module test; and the stdin/stdout/stderr 
    214 files follow the name of the test as before.  In the same directory, 
    215 place a file 'test.T' containing a line like 
    216 {{{ 
    217    test(multimod001, normal, multimod_compile_and_run, \ 
    218                  [ 'Main', '-fglasgow-exts', '', 0 ]) 
    219 }}} 
    220 as described above. 
    221  
    222 For some examples, take a look in tests/ghc-regress/programs. 
    223  
    224 = Format of the test entries = 
    225  
    226 Each test in a `test.T` file is specified by a line the form 
    227 {{{ 
    228       test(<name>, <setup>, <test-fn>, <args>) 
    229 }}} 
    230  
    231 == The <name> field == 
    232  
    233 ''<name>'' is the name of the test, in quotes (' or "). 
    234  
    235 == The <setup> field == 
    236  
    237 ''<setup>''  is a function (i.e. any callable object in Python) 
    238 which allows the options for this test to be changed. 
    239 There are many pre-defined functions which can be 
    240 used in this field: 
    241  
    242  * '''normal'''                don't change any options from the defaults 
    243  * '''skip'''                  skip this test 
    244  * '''skip_if_no_ghci'''       skip unless GHCi is available 
    245  
    246  * '''skip_if_fast'''          skip if "fast" is enabled 
    247  
    248  * '''omit_ways(ways)'''       skip this test for certain ways 
    249  
    250  * '''only_ways(ways)'''       do this test certain ways only 
    251  
    252  * '''extra_ways(ways)'''      add some ways which would normally be disabled 
    253  
    254  * '''omit_compiler_types(compilers)'''                           skip this test for certain compilers 
    255  
    256  * '''only_compiler_types(compilers)'''       do this test for certain compilers only 
    257  
    258  * '''expect_broken(bug)''' this test is a expected not to work due to the indicated trac bug number 
    259  
    260  * '''expect_broken_for(bug, ways)''' as expect_broken, but only for the indicated ways 
    261  
    262  * '''if_compiler_type(compiler_type, f)''' Do `f`, but only for the given compiler type 
    263  
    264  * '''if_platform(plat, f)'''  Do `f`, but only if we are on the specific platform given 
    265  
    266  * '''if_tag(tag, f)'''        do `f` if the compiler has a given tag 
    267  
    268  * '''unless_tag(tag, f)'''    do `f` unless the compiler has a given tag 
    269  
    270  * '''set_stdin(file)'''       use a different file for stdin 
    271  
    272  * '''no_stdin'''              use no stdin at all (otherwise use `/dev/null`) 
    273  
    274  * '''exit_code(n)'''          expect an exit code of 'n' from the prog 
    275  
    276  * '''extra_run_opts(opts)'''  pass some extra opts to the prog 
    277  
    278  * '''no_clean'''              don't clean up after this test 
    279  
    280  * '''extra_clean(files)'''    extra files to clean after the test has completed 
    281  
    282  * '''reqlib(P)'''             requires package P 
    283  
    284  * '''req_profiling'''         requires profiling 
    285  
    286  * '''ignore_output'''         don't try to compare output 
    287  
    288  * '''alone'''                 don't run this test in parallel with anything else 
    289  
    290  * '''literate'''              look for a `.lhs` file instead of a `.hs` file 
    291  
    292  * '''c_src'''                 look for a `.c` file 
    293  
    294  * '''cmd_prefix(string)'''    prefix this string to the command when run 
    295  
    296  * '''normalise_slashes'''     convert backslashes to forward slashes before comparing the output 
    297  
    298 The following should normally not be used; instead, use the `expect_broken*` 
    299 functions above so that the problem doesn't get forgotten about, and when we 
    300 come back to look at the test later we know whether current behaviour is why 
    301 we marked it as expected to fail: 
    302  
    303  * '''expect_fail'''           this test is an expected failure, i.e. there is a known bug in the compiler, but we don't want to fix it. 
    304  
    305  * '''expect_fail_for(ways)''' expect failure for certain ways  
    306  
    307 To use more than one modifier on a test, just put them in a list. 
    308 For example, to expect an exit code of 3 and omit way 'opt', we could use 
    309 {{{ 
    310       [ omit_ways(['opt']), exit_code(3) ] 
    311 }}} 
    312 as the `<setup>` argument. 
    313  
    314 == The <test-fn> field == 
    315  
    316 ''<test-fn>'' 
    317 is a function which describes how the test should be 
    318 run, and determines the form of <args>.  The possible 
    319 values are: 
    320  
    321  * '''compile'''  Just compile the program, the compilation should succeed. 
    322  
    323  * '''compile_fail''' 
    324    Just compile the program, the 
    325    compilation should fail (error 
    326    messages will be in T.stderr). 
    327    This kind of failure is mandated by the language definition - it does '''not''' indicate any bug in the compiler. 
    328  
    329  * '''compile_and_run''' 
    330    Compile the program and run it, 
    331    comparing the output against the  
    332    relevant files. 
    333  
    334  * '''multimod_compile''' 
    335    Compile a multi-module program 
    336    (more about multi-module programs 
    337    below). 
    338  
    339  * '''multimod_compile_fail''' 
    340    Compile a multi-module program, 
    341    and expect the compilation to fail 
    342    with error messages in T.stderr.  This kind of failure does '''not''' indicate a bug in the compiler. 
    343  
    344  * '''multimod_compile_and_run''' 
    345    Compile and run a multi-module 
    346    program. 
    347  
    348  * '''compile_and_run_with_prefix''' 
    349    Same as compile_and_run, but with command to use to run the execution of the result binary. 
    350  
    351  * '''multimod_compile_and_run_with_prefix''' 
    352    Same as multimod_compile_and_run, but with command to use to run the execution of the result binary. 
    353  
    354  * '''run_command''' 
    355    Just run an arbitrary command.  The output is checked 
    356    against `T.stdout` and `T.stderr` (unless `ignore_output` 
    357    is used), and the stdin and expected exit code can be 
    358    changed in the same way as for compile_and_run.  NB: run_command only works  
    359    in the '''normal''' way, so don't use '''only_ways''' with it. 
    360  
    361  * '''ghci_script''' 
    362    Runs the current compiler, passing 
    363    --interactive and using the specified 
    364    script as standard input. 
    365  
    366 == The <args> field == 
    367  
    368 ''<args>'' is a list of arguments to be passed to <test-fn>. 
    369  
    370 For compile, compile_fail and compile_and_run, <args> 
    371 is a list with a single string which contains extra 
    372 compiler options with which to run the test.  eg. 
    373 {{{                 
    374     test('tc001', normal, compile, ['-fglasgow-exts']) 
    375 }}} 
    376 would pass the flag -fglasgow-exts to the compiler 
    377 when compiling tc001. 
    378  
    379 The multimod_ versions of compile and compile_and_run 
    380 expect an extra argument on the front of the list: the 
    381 name of the top module in the program to be compiled 
    382 (usually this will be 'Main'). 
    383  
    384  
    385  
    386 = Sample output files = 
    387  
    388 Normally, the sample `stdout` and `stderr` for a test T go in the 
    389 files `T.stdout` and `T.stderr` respectively.  However, sometimes a 
    390 test may generate different output depending on the platform, 
    391 compiler, compiler version, or word-size.  For this reason the test 
    392 driver looks for sample output files using this pattern: 
    393  
    394 {{{ 
    395  T.stdout[-<compiler>][-<version>][-ws-<wordsize>][-<platform>] 
    396 }}} 
    397  
    398 Any combination of the optional extensions may be given, but they must 
    399 be in the order specified.  The most specific output file that matches 
    400 the current configuration will be selected; for example if the 
    401 platform is `i386-unknown-mingw32` then `T.stderr-i386-unknown-mingw32` 
    402 will be picked in preference to `T.stderr`. 
    403  
    404 Another common example is to give different sample output for an older 
    405 compiler version.  For example, the sample `stderr` for GHC 6.8.x would go in the file 
    406 `T.stderr-ghc-6.8`. 
    407  
    408 = The details = 
    409  
    410 The test suite driver is just a set of Python scripts, as are all of 
    411 the .T files in the test suite.  The driver (driver/runtests.py) first 
    412 searches for all the .T files it can find, and then proceeds to 
    413 execute each one, keeping a track of the number of tests run, and 
    414 which ones succeeded and failed. 
    415  
    416 The script runtests.py takes several options: 
    417  
    418   --config <file> 
    419    
    420        <file> is just a file containing Python code which is  
    421        executed.   The purpose of this option is so that a file 
    422        containing settings for the configuration options can 
    423        be specified on the command line.  Multiple --config options 
    424        may be given. 
    425  
    426   --rootdir <dir> 
    427  
    428        <dir> is the directory below which to search for .T files 
    429        to run. 
    430  
    431   --output-summary <file> 
    432  
    433        In addition to dumping the test summary to stdout, also 
    434        put it in <file>.  (stdout also gets a lot of other output 
    435        when running a series of tests, so redirecting it isn't   
    436        always the right thing). 
    437  
    438   --only <test> 
    439  
    440        Only run tests named <test> (multiple --only options can 
    441        be given).  Useful for running a single test from a .T file 
    442        containing multiple tests. 
    443  
    444   -e <stmt> 
    445  
    446        executes the Python statement <stmt> before running any tests. 
    447        The main purpose of this option is to allow certain 
    448        configuration options to be tweaked from the command line; for 
    449        example, the build system adds '-e config.accept=1' to the 
    450        command line when 'make accept' is invoked. 
    451  
    452 Most of the code for running tests is located in driver/testlib.py. 
    453 Take a look. 
    454  
    455 There is a single Python class (TestConfig) containing the global 
    456 configuration for the test suite.  It contains information such as the 
    457 kind of compiler being used, which flags to give it, which platform 
    458 we're running on, and so on.  The idea is that each platform and 
    459 compiler would have its own file containing assignments for elements 
    460 of the configuration, which are sourced by passing the appropriate 
    461 --config options to the test driver.  For example, the GHC 
    462 configuration is contained in the file config/ghc. 
    463  
    464 A .T file can obviously contain arbitrary Python code, but the general 
    465 idea is that it contains a sequence of calls to the function test(), 
    466 which resides in testlib.py.  As described above, test() takes four 
    467 arguments: 
    468  
    469       test(<name>, <opt-fn>, <test-fn>, <args>) 
    470  
    471 The function <opt-fn> is allowed to be any Python callable object, 
    472 which takes a single argument of type TestOptions.  TestOptions is a 
    473 class containing options which affect the way that the current test is 
    474 run: whether to skip it, whether to expect failure, extra options to 
    475 pass to the compiler, etc. (see testlib.py for the definition of the 
    476 TestOptions class).  The idea is that the <opt-fn> function modifies 
    477 the TestOptions object that it is passed.  For example, to expect 
    478 failure for a test, we might do this in the .T file: 
    479 {{{ 
    480    def fn(opts): 
    481       opts.expect = 'fail' 
    482  
    483    test(test001, fn, compile, ['']) 
    484 }}} 
    485 so when fn is called, it sets the instance variable "expect" in the 
    486 instance of TestOptions passed as an argument, to the value 'fail'. 
    487 This indicates to the test driver that the current test is expected to 
    488 fail. 
    489  
    490 Some of these functions, such as the one above, are common, so rather 
    491 than forcing every .T file to redefine them, we provide canned 
    492 versions.  For example, the provided function expect_fail does the 
    493 same as fn in the example above.  See testlib.py for all the canned 
    494 functions we provide for <opt-fn>. 
    495  
    496 The argument <test-fn> is a function which performs the test.  It 
    497 takes three or more arguments: 
    498  
    499       <test-fn>( <name>, <way>, ... ) 
    500  
    501 where <name> is the name of the test, <way> is the way in which it is 
    502 to be run (eg. opt, optasm, prof, etc.), and the rest of the arguments 
    503 are constructed from the list <args> in the original call to test(). 
    504 The following <test-fn>s are provided at the moment: 
    505  
    506            compile 
    507            compile_fail 
    508            compile_and_run 
    509            multimod_compile 
    510            multimod_compile_fail 
    511            multimod_compile_and_run 
    512            run_command 
    513            run_command_ignore_output 
    514            ghci_script 
    515  
    516 and obviously others can be defined.  The function should return 
    517 either 'pass' or 'fail' indicating that the test passed or failed 
    518 respectively. 
    519  
    520 = Problems running the testsuite = 
    521  
    522  1. If the test suite fails mysteriously, make sure that the {{{timeout}}} utility is working properly. This Haskell utility is compiled with the stage 1 compiler and invoked by the python driver, which does not print a nice error report if the utility fails. This can happen if, for example, the compiler produces bogus binaries. A workaround is to compile {{{timeout}}} with a stable {{{ghc}}}. 
    523  
    524 = The testsuite and branches = 
     33== The testsuite and version control branches == 
    52534 
    52635It is not clear what to do with the testsuite when branching a compiler; should the testsuite also be branched? 
     
    54756test(tc5, namebase_if_compiler_lt('ghc','6.9', 'tc5-6.8'), ...) 
    54857}}} 
     58