Changes between Version 9 and Version 10 of Commentary/Compiler/Backends/NCG/RegisterAllocator


Ignore:
Timestamp:
Sep 18, 2007 2:58:46 PM (7 years ago)
Author:
guest
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Commentary/Compiler/Backends/NCG/RegisterAllocator

    v9 v10  
    6565  }}} 
    6666 
    67  * '''Register pressure in Haskell code'''[[BR]] 
    68    Most GHC compiled code has very little register pressure and only a few spill/reload instructions need to be inserted, many modules need none at all. This is a mixed blessing - on one hand the conflict graphs are small so we don't have too many performance problems related to how the graph is represented, on the other hand it can be hard to find code to test against. 
     67 * Graphviz 
    6968 
    70   Only a few nofib benchmarks create spills with {{{-O2}}}, two are {{{spectral/hartel/genfft}}} and {{{spectral/sorting}}}. 
     69 * checkSpills 
    7170 
    72   Register pressure increases significantly when profiling is turned on. 
    73    
     71== Register pressure in Haskell code == 
    7472 
     73Present GHC compiled code places very little pressure on the register set. Even on x86 with only 3 allocable registers, most modules do not need spill/reloads. This is a mixed blessing - on one hand the conflict graphs are small so we can avoid performance problems related to how the graph is represented, on the other hand it can be hard to find code to test against. Register pressure is expected to increase as the Stg->Cmm transform improves. 
     74 
     75In the meantime, here are some good sources for test code: 
     76 
     77 * '''Nofib'''[[BR]] 
     78   Only a few nofib benchmarks create spills with {{{-O2}}}, two are {{{spectral/hartel/genfft}}} and {{{spectral/sorting}}}. 
     79 
     80 * '''Turn on profiling.'''[[BR]] 
     81   Register pressure increases significantly when the module is compiled with profiling. [attachment:checkSpills.report] gives tuples of {{{(spills, reloads, reg-reg-moves)}}} present in output code generated by the three algorithms when compiled with {{{-O2 -prof}}}. Left to right are the stats for the linear, graph coloring and iterative coalescing algorithms. Note that most modules compile with no spill/reloads inserted, but a few (notably {{{real/compress2/Encode}}}) need several hundred. 
     82 
     83  I've found it useful to maintain three darcs repos when working on the allocator. {{{ghc-HEAD-work}}} compiled with {{{-Onot}}} for fast compilation during hacking, {{{ghc-HEAD-prof}}} for testing with profiling turned on, and {{{ghc-HEAD-validate}}} for running the validate script. Patches are created in {{{work}}}, pushed into {{{prof}}} where {{{checkSpills}}} is used to compile the nofib benchmarks with the most register pressure. Once we're happy that the performance is ok, the patch is then pushed into {{{validate}}} for validation before pushing to the main repo on {{{darcs.haskell.org}}} 
     84 
     85 * '''SHA from darcs'''[[BR]] 
     86   The {{{SHA1.lhs}}} module from the darcs source, compiled with {{{-O2}}} creates the most register pressure out of any Haskell code that I'm aware of. When compiling SHA1, GHC inlines several worker functions and the native code block that computes the hash ends up being around 1700 instructions long. vregs that live in the middle of the block have in the order of 30 conflict neighbors. (evidently, the conflict graph is too large for most of the graphviz layout algorithms to cope with) 
     87 
     88  For these reasons, {{{SHA1.lhs}}} can be treated as a good worst-case input to the allocator. In fact, the current linear allocator cannot compile it with {{{-O2 -prof}}} on x86 as it runs out of stack slots, which are allocated from a static pool. Make sure to test any changes to the allocator against this module. 
    7589 
    7690== Possible Improvements == 
    7791 
    78  * work lists 
     92These are some ideas for improving the current allocator, most usful first. 
    7993 
    80  * spill code 
     94 * Work lists for iterative coalescing. 
    8195 
    82  * spill candidates 
     96 * Improve spill code cleaner. 
    8397 
    84  * aliasing register sets 
     98 * Revisit choosing of spill candidates. 
     99 
     100 * Revisit trivColorable / aliasing of register sets. 
    85101 
    86102