Changes between Initial Version and Version 1 of Performance

Oct 31, 2006 10:22:13 PM (9 years ago)

From HaWiki


  • Performance

    v1 v1  
     1This page collects information about the performance of GHC and GHC-compiled code. We're interested in programs that GHC compiles slowly, programs that run slowly (especially small ones), and ideas for improving performance of either GHC itself or GHC-compiled code.
     3= Performance of GHC =
     5The Todo list for GHC performance:
     7 * Emit the interface immediately after typechecking in non-optimised compilation
     8 * Turn on the NCG by default for -O (possibly improve x87 float code first?)
     9 * Stringbuffer could use mmap()
     10 * Improve performance of generating output (SimonMarlow: done).
     11 * Profile & do a space-leak sweep (SimonMarlow: done, fixed one space leak, space usage not great but could be worse).
     12 * Improve binary interface representation (SimonMarlow: some improvement here).
     14Half-baked ideas:
     16 * glom the interfaces for a package into a single uber-interface with one string table.
     17 * Classic loop optimizations that are not invariant hoisting. (Jan-Willem Maessen)
     19== Modules that GHC compiles slowly ==
     21Simon PJ found a case of non-linearity in the simplifier; we are currently testing a patch that hopefully fixes all of the following:
     23 * Language.Haskell.Syntax
     24 * large constant structures taking non-linear time to compile
     25 * large Happy-generated grammars
     26 * WashNGo-2.4.6/WASH/HTMLMonad98.hs from WASH/CGI (SimonMarlow: investigating)
     27 * dsrun011 from the test suite (SimonMarlow: fixed)
     28 * record with lots of fields (20+), deriving Show, with -O (SimonMarlow: fixed)
     30= Performance of compiled code =
     32== Programs that run too slowly ==
     34 * Various of the shootout benchmarks (Todo: expand).
     36== Performance Todo list ==
     38Here we list some problems that we know about, and/or ideas for improving things.
     40 * x86_64: use more registers (the NCG needs to save/restore volatile regs over C calls, apart from that we're there).
     41 * via-C produces code with too many jumps, especially with later versions of GCC. GCC commons up identical basic blocks containing a single jump instruction.
     42 * GC has been reported to behave non-linearly (SimonMarlow: turned out to be caused by swapping)
     43 * Space leak: a function that is statically known to not use some or all of its arguments should have a function descriptor that reflects the unused pointers as non-pointers, so that a PAP applied to unused args doesn't retain them.
     44 * Push heap check into a case branch if that means we can avoid a heap check in a tail-recursive loop.
     45 * Make ticky-ticky profiling work again.
     46 * Make the NCG handle loops, and perform some simple loop optimisations.
     47 * Better optimisation pass sequences? (c.f. Laszlo Nemeth's research
     48 * Can we do anything about Double alignment?
     49 * better GC for chains of THUNK_SELECTORs (see comments in GC.c)
     50 * Int64 reported to be slow (Lemmih: Fixed. It's now only three times slower to use Int64 over Int in that example. (patch not commited yet)).
     51 * System.Random could be optimized more.
     53A few slightly larger projects:
     55 * parallel GC for multi-processors
     56 * Allow strict enumerations to be "unboxed"
     58Some more open-ended ideas:
     60 * Strict or unlifted types, or strictness annotations on functions
     61 * How about an anti-strictness annotation? Value is always re-evaluated when used?
     62 * Can GC performance be improved at all? Perhaps: a combination of depth-first and breadth-first traversal instead of just breadth-first. (SimonMarlow: attempted, no joy. But managed to find some other opportunities for improvement in the GC while I was there.)
     63 * improve compacting GC performance (ask SimonMarlow for ideas)
     64 * Whole program optimisation infrastructure. Dump intermediate representation into a special section of the .o files (or use another file) and upon linking the program/library run another optimisation pass over the entire lot before finally generating c/cmm. Only enabled with -O2/-O3 or something.
     66= Size of compiled code =
     68List modules or programs that produce code that is too large:
     70The Todo list for code size:
     72 * Implement dynamic libraries on other platforms: x86/Linux, x86_64/Linux, x86/*BSD, x86_64/*BSD.
     73 * Find common code fragments and share them
     74 * Top level CSE of lambdas module alpha equivalence. This come up from time to time when inlining occurs. It doesn't look like it gets done, but I could be missing the pass. (Jan-Willem Maessen)