Changes between Version 123 and Version 124 of LightweightConcurrency


Ignore:
Timestamp:
May 31, 2012 9:31:41 PM (23 months ago)
Author:
kc
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • LightweightConcurrency

    v123 v124  
    391391For the LWC implementation, can we utilize the scheduler actions to yield control to another thread from the user-level scheduler, similar to the solutions above? The simple answer is no. Since the scheduler actions themselves are implemented in Haskell code, they can also encounter blackholes. Hence, we might encounter situations where the user-level scheduler becomes blocked on a thread that it is scheduling, resulting in a deadlock.  
    392392 
    393 Since thunks (usually) represent pure computation, can we not duplicate thunk evaluation when we detect a deadlocked scheduler? Unfortunately, this is not so straightforward. The closure that represents a thunk is lost when the thunk is black-holed. Moreover, the thread evaluating the blackholed thunk (blackhole owner) might be running on the same or a different capability than the thread entering the blackhole. Correspondingly, the blackhole owner thread might either not be schedulable or running. This complicates the problem of potentially forcing a blackholed thunk's evaluation on a thread other than the blackhole owner. It is for these reasons we handle blackholes transparently from the programmer's perspective in the LWC implementation.  
     393Since thunks (usually) represent pure computation, can we not duplicate thunk evaluation when we detect a deadlocked scheduler? Unfortunately, this is not so straightforward. The closure that represents a thunk is lost when the thunk is blackholed. Moreover, the thread evaluating the blackholed thunk (blackhole owner) might be running on the same or a different capability than the thread entering the blackhole. Correspondingly, the blackhole owner thread might either not be schedulable or running. This complicates the problem of potentially forcing a blackholed thunk's evaluation on a thread other than the blackhole owner. It is for these reasons we handle blackholes transparently from the programmer's perspective in the LWC implementation.  
    394394 
    395395When a thread enters a blackhole, there are essentially 3 parameters that we need to consider: 
     
    401401Since each of these conditions can either be true or false, we have 8 cases to consider.  
    402402 
    403  * '''(1, 2) PTM(F)   UPT(F)   CCAP(T/F)''' - This is the typical case when a thread blocks on a black hole. Here, we enque the thread on the blackhole's blocked thread queue and perform the yieldControlAction to switch to another thread. When the thunk finishes evaluation, we examine the blocked thread queue. If a blocked thread is not an upcall thread, we know it has a scheduleSContAction, which is executed to resume the blocked thread. 
     403 * '''(1, 2) PTM(F)   UPT(F)   CCAP(T/F)''' - This is the typical case when a thread blocks on a blackhole. Here, we enque the thread on the blackhole's blocked thread queue and perform the yieldControlAction to switch to another thread. When the thunk finishes evaluation, we examine the blocked thread queue. If a blocked thread is not an upcall thread, we know it has a scheduleSContAction, which is executed to resume the blocked thread. 
    404404 * '''(3, 4) PTM(F)   UPT(T)   CCAP(T/F)''' - This case cannot happen. Upcall threads only execute PTM actions. 
    405405 * '''(5, 6) PTM(T)   UPT(T/F) CCAP(T)'''   - We are under PTM and potentially manipulating the scheduler. The blackhole is owned by a thread on current capability and is suspended. Hence, the only option is to force evaluation of the thunk. This is achieved by creating a closure (AP_STACK) that contains all of the frames from the blackhole owner thread until the update frame that corresponds to the blackholed thunk. Blackhole owner's stack is modified such that when it resumes, it evaluates the newly created closure instead of resuming the original thunk evaluation. Current thread evaluates the newly created thunk to force evaluation of the thunk. Here, the current thread is said to have `inherited` the thunk. 
    406406 * '''(7)    PTM(T)   UPT(F)   CCAP(F)'''   - A user-level thread under PTM has blocked on a blackhole owned by a thread on a different capability. We cannot inherit the computation. The solution is similar to (1).  
    407  * '''(8)    PTM(T)   UPT(T)   CCAP(F)'''   - This is a tricky case. Upcall thread blocks on a blackhole, which is owned by a thread on a different capability. We need to put the capability to sleep and wake-up when the black-holed thunk finishes evaluation. Here, we enque the upcall thread on the blackhole's blocked thread queue. Now, the current capability does not have any runnable threads. Hence, it goes to sleep. When the thunk finishes evaluation, we examine the blocked thread queue. If a blocked thread is an upcall thread, we push it on its owning capability. This implicitly wakes up the capability, which resumes execution.   
     407 * '''(8)    PTM(T)   UPT(T)   CCAP(F)'''   - This is a tricky case. Upcall thread blocks on a blackhole, which is owned by a thread on a different capability. We need to put the capability to sleep and wake-up when the blackholed thunk finishes evaluation. Here, we enque the upcall thread on the blackhole's blocked thread queue. Now, the current capability does not have any runnable threads. Hence, it goes to sleep. When the thunk finishes evaluation, we examine the blocked thread queue. If a blocked thread is an upcall thread, we push it on its owning capability. This implicitly wakes up the capability, which resumes execution.   
    408408 
    409409==== RTS Messaging Layer ==== 
     
    411411Since thunk evaluation and blackholing is a critical for good performance, we would like the common case - thunk finishes evaluation without being blackholed - to be fast. Hence, we retain the RTS messaging layer between the capabilities for blocking on a blackhole. When a thread enters a blackhole whose owner thread resides on another capability, a block request message is sent to the corresponding capability. Notice that the [#SContAffinity association] between SConts (threads) and capabilities is essential for identifying which capability to send the block request message to. During every iteration of the RTS Schedule loop, a capability checks its inbox for pending messages, and if any, processes the messages. Hence, no synchronization is necessary for replacing a thunk with a value.  
    412412 
     413=== Exceptions escaping SConts === 
     414 
     415Every SCont has a top-level exception handler, which catches all exceptions and executes the SCont's yieldControlAction in the exception handler. If an exception escapes the computation spawned as an SCont, we mark the SCont's status as `SContKilled`, and switch to the next available SCont from the scheduler. This ensures that schedulers are not lost if an SCont is killed. 
     416 
    413417=== Asynchronous Exceptions === 
     418 
     419The substrate exposes  
     420 
     421{{{ 
     422throwTo :: Exception e => SCont -> e -> IO () 
     423}}} 
     424 
     425primitive which raises an arbitrary exception on the given SCont. The masking semantics is exactly the same as [http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Concurrent.html throwTo under Control.Concurrent]. Under the hood, RTS messaging layer is used to raised exceptions on SConts belonging to other capabilities. This is necessary since raising asynchronous exceptions involves modifying the stack, and hence, is safe only if performed on the capability to which the target SCont belongs to. If the calling SCont blocks on throwTo, we utilize the scheduler actions to resume other SConts that might be available on the scheduler.  
     426 
     427If an exception is raised on an SCont that is blocked on an blackhole, STM, or user-level concurrent data structure, we remove the SCont from any blocking queues, raise the exception, and utilize the SCont's scheduleSContAction to enqueue it back to the scheduler. If an exception is raised on a SCont suspended on a scheduler, we simply raise the exception. 
     428 
     429For blocking actions in the RTS, such as STM, and blackholes, RTS knows how to remove the SCont from the corresponding queue. However, if an SCont happens to be blocked on a user-level data structure such as an MVar, how do we asynchronously remove the thread from the MVar data structure? Once could envision a model where a SCont blocking on a concurrent data structure would provide a `unblockSCont :: PTM()` which will remove the blocked SCont from the user-level blocking queue. The RTS blocking queues are implemented as doubly-linked lists such that removing an element from the middle of the list is fast. However, implementing an efficient unblockSCont action for every user-level data structure can be cumbersome and complicated, and defeats the purpose of lifting the concurrency library to Haskell. 
     430 
     431Alternatively, instead of eagerly removing the SCont from the user-level blocking queue, we can defer it until the SCont is about to be unblocked from the blocking queue. We will raise the asynchronous exception on the SCont, eagerly append it to the scheduler, and mark the blocked action as invalid. This is achieved through resume tokens. 
     432 
     433{{{ 
     434data ResumeToken 
     435newResumeToken     :: PTM ResumeToken 
     436isResumeTokenValid :: ResumeToken -> PTM Bool 
     437 
     438data SContSwitchReason = BlockedInHaskell ResumeToken | ... 
     439}}} 
     440 
     441Primitive `newResumeToken` allocates a new, valid resume token. The validity of a resume token can be queried using the primitive `isResumeTokenValid`. Whenever an SCont blocks on a user-level data structure, it is expected that it is given a new valid resume token. If an asynchronous exception is raised on this blocked SCont, the resume token is invalidated. Eventually, when the SCont is about to be unblocked from the concurrent data-structure, the resume token can be queried for validity. If the resume token is invalid, then the blocked SCont has been resumed already and hence it should not be resumed again. The following snippet shows the implementation of `takeMVar` primitive that can tolerate asynchronous exceptions. The only change is to the `wakeup` function. 
     442 
     443{{{ 
     444takeMVar :: MVar a -> IO a  
     445takeMVar (MVar ref) = do 
     446  hole <- atomically $ newPVar undefined  
     447  atomically $ do 
     448    st <- readPVar ref  
     449    case st of 
     450      Empty ts -> do  
     451        s <- getCurrentSCont  
     452        ssa <- getSSA 
     453        token <- newResumeToken 
     454        let wakeup = do { 
     455          v <- isResumeTokenValid token; 
     456          if v then 
     457            ssa s 
     458          else 
     459            return () 
     460        } 
     461        writePVar ref $ v 
     462          where v = Empty $ ts++[(hole, wakeup)]  
     463        switchToNext <- getYCA 
     464        setSContSwitchReason s $ BlockedInHaskell ... 
     465        switchToNext 
     466      Full x ((x', wakeup):ts) -> do  
     467        writePVar hole x  
     468        writePVar ref $ Full x' ts  
     469        wakeup 
     470      otherwise -> ...  
     471  atomically $ readPVar hole 
     472}}} 
     473 
     474 
    414475 
    415476== Related Work ==