Changes between Version 5 and Version 6 of Status/Apr10

May 4, 2010 8:58:33 AM (4 years ago)

my status update


  • Status/Apr10

    v5 v6  
    1919 * Inliner changes (incl CONLIKE) '''Simon PJ''' 
    2020 * Data parallel Haskell: '''Manuel''' 
    21  * Runtime system, esp message passing changes.  '''Simon M'''. 
    22  * Parallel GC; removing "steps".  '''Simon M'''. 
    23  * Threadscope? '''Simon M''' 
     21 * The [ Threadscope] tool for visualising parallel execution was released.  The tool is ripe for improvement in many ways, if you're interested in helping let us know! 
     23=== Runtime system work (SimonM) === 
     25There has been a lot of restructuring in the RTS over the past few months, particularly in the area of parallel execution.  The biggest change is to the way "blackholes" work: these arise when one thread is evaluating a lazy computation (a "thunk"), and another thread or threads demands the value of the same thunk.  Previously, all threads waiting for the result of a thunk were kept in a single global queue, which was traversed regularly.  This lead to two performance problems.  Firstly, traversing the queue is O(n) in the number of blocked threads, and we recently encountered some benchmarks in which this was the bottleneck.  Secondly, there could be a delay between completing a computation and waking up the threads that were blocked waiting for it.  Fortunately, we found a design that solves both of these problems, while adding very little overhead. 
     27We also fixed another pathalogical performance case: when a large numbers of threads are blocked on an MVar and become unreachable at the same time, reaping all these threads was an O(n^2^) operation.  A new representation for the queue of threads blocked on an MVar solved this problem. 
     29At the same time, we rearchitected large parts of the RTS to move from algorithms involving shared data structures and locking to a message-passing style.  As things get more complex in the parallel RTS, using message-passing let us simplify some of the invariants and move towards having less shared state between the CPUs, which will improve scaling in the long run. 
     31The GC has seen some work too: the goal here is to enable each processor ("capability" in the internal terminology) to collect its private heap independently of the other processors.  It turns out that this is quite tricky to achieve in the context of the current architecture, but we have made some progress in this direction by privatising more of the global state and simplifying the GC data structures by removing the concept of "steps", while keeping a simple aging policy which is what steps gave us previously. 
    2533=== Nightly builds ===