Opened 3 months ago

Last modified 3 months ago

#8680 new bug

In STM: Variables only in left branch of orElse can invalidate the right branch transaction

Reported by: jberryman Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.6.3
Keywords: Cc: fryguybob@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

I'm sorry if this is expected behavior; I'm still trying to wrap my head around this. I was surprised to learn that the right branch of an orElse can be invalidated by changes to the world only visible by the left branch. This might lead to starvation or performance issues for the use-case I was envisioning it for (e.g. several threads trying to take from one of a large set of TMVars in a randomized round-robin fashion).

Here is a really bad example demonstrating current behavior:

import Control.Concurrent
import Control.Concurrent.STM
import Control.Concurrent.STM.TSem
import Control.Monad

import System.IO
import Debug.Trace

main = do
    hSetBuffering stdout NoBuffering
    noMansLand <- replicateM 998 $ newTVarIO 0
    t0 <- newTVarIO (1::Int)
    t999 <- newTVarIO (-1)
    let ts = t0:noMansLand++[t999]

    done <- atomically $ newTSem 0
    forkIO $ atomically $ nestedOrElseMap done ts

    -- need enough time here for nestedOrElseMap thread above to move past t0

    -- in this version, the modifications to t0 force nestedOrElseMap to be restarted
    forkIO (trace "starting vexing!" $ forever $ atomically $ (modifyTVar' t0 (+1) >> trace "vex" (return ())))
    -- in this version nestedOrElseMap causes this transaction to be restarted and never makes progress:
    --forkIO (atomically (trace "starting vexing!" $ forever $ (modifyTVar' t0 (+1) >> trace "vex" (return ()))))

    atomically $ waitTSem done
    putStrLn "No livelock! Did the t0 counter get incremented?: "
    atomically (readTVar t0) >>= print

-- another thread begins modifying head ts after we've moved to the right branch
nestedOrElseMap :: TSem -> [TVar Int] -> STM ()
nestedOrElseMap done ts = trace "nestedOrElseMap starting" $ foldl1 orElse $ map transaction $ zip [(1::Int)..] ts
    where transaction (cnt,v) = do
            n <- traceShow cnt $ readTVar v
            if n < 0 
                then trace "@" (modifyTVar' v (subtract 1)) >> signalTSem done
                else  retry

I can see it possibly being useful that both branches of an orElse see the same view of the variables they share, but current behavior seems overzealous in restarting the whole transaction.

Maybe this is an artifact of changes related to #7493? Or maybe it's obvious why this behavior has to be the way it is and its just not clicking for me.

Change History (4)

comment:1 Changed 3 months ago by carter

any chance you could cook up an even smaller example that exhibits the problem? :)

comment:2 follow-up: Changed 3 months ago by ezyang

I haven't checked your program, but what you have textually described is the behavior I would expect for orElse (and is consistent with the semantics): the 'else' branch is only valid if the 'if' branch retries; so if the first branch is invalidated, it may not be retrying anymore.

This suggests there might be a useful nondeterministic version of orElse, which relaxes this restriction.

comment:3 in reply to: ↑ 2 ; follow-up: Changed 3 months ago by jberryman

Replying to ezyang:

I haven't checked your program, but what you have textually described is the behavior I would expect for orElse (and is consistent with the semantics): the 'else' branch is only valid if the 'if' branch retries; so if the first branch is invalidated, it may not be retrying anymore.

So I think I got caught up in the details and convinced myself that there wasn't really a way to do any useful reasoning with this behavior. But I suppose it lets you say e.g. "if we're in the right branch then our view of the world is one where predicate P from the left branch is False", even if the two branches share no variables and the first is rolled back.

This suggests there might be a useful nondeterministic version of orElse, which relaxes this restriction.

Now I'm not even sure that the current orElse behavior won't work for me. Would it be better to update this ticket when I have a clearer idea, or just consider opening a new feature request ticket in the future?

Thanks for the feedback.

comment:4 in reply to: ↑ 3 Changed 3 months ago by fryguybob

  • Cc fryguybob@… added

As ezyang said, this is the expected behavior. We only visit the right branch if the left branch reaches retry. While the effects of the left branch are discarded, the outcome is not. When the whole transaction commits, it will check that the outcome of the left branch is still the same by checking that the reads from the left branch have not changed.

This becomes important if the right branch reaches retry (and this is a top-level orElse). When the transaction blocks, it needs to be added to the watch list for all the TVars read, including the branches that we discarded as it may be the case that an earlier branch will allow the transaction to make progress. Before we can block we must validate that we are not blocking due to reading inconsistent data. Otherwise we could block and miss the chance to wake up.

While I think a nondeterministic orElse should be possible, it would be tricky to get it right and with the fairness that we would clearly want. I'm not sure what are the right interactions with nesting and blocking. A particular branch could reach retry due to inconsistent reads. Do we still propagate the retry up, potentially taking another branch at a higher level? Do we block on this faulty data, or validate the discarded reads first? If found invalid do we start all over, or do we try inconsistent branches again? If so in what order? Perhaps there is some wisdom to be gleaned from this work: http://www.cs.rit.edu/~mtf/research/tx-events/IFL11/techrpt.pdf

If you only care about a top-level chain of orElses as in the example, you can sort of accomplish what you want now with something like this:

atomicallyOneOf :: [STM a] -> IO a
atomicallyOneOf ts = go (cycle (map run ts))
  where
    go (t:ts) = do
        v <- t
        case v of
           Nothing -> go ts
           Just a  -> return a
    run t = atomically ((Just <$> t) `orElse` return Nothing)

But this will still get "stuck" when it runs across a t that is contending with a repeated faster transaction. If we had a tryAtomically that didn't bother to start again you could do a little better, but I'm not sure this is a good API to give users in general as they might reach for it at the wrong time. The problem might be better addressed in the other direction by avoiding the repeated fast committing transaction.

Note: See TracTickets for help on using tickets.