STM fails to validate read.
This issue was brought up by napping in #haskell
with this paste:
http://hpaste.org/85134 (the first paste in particular)
The code is:
import Control.Concurrent.STM
import Control.Concurrent
import Control.Monad
main = do
dog <- newTVarIO False
cat <- newTVarIO False
let unset = do
d <- readTVar dog
c <- readTVar cat
if (d || c) then retry else return ()
setDog = unset >> writeTVar dog True
setCat = unset >> writeTVar cat True
oops = do
d <- readTVar dog
c <- readTVar cat
guard (d && c)
reset = do
writeTVar dog False
writeTVar cat False
reset' = do
d <- readTVar dog
c <- readTVar cat
guard (d || c)
reset
forkIO (atomically oops >> putStrLn "Oh Noes!")
forever (do
forkIO (atomically setDog)
forkIO (atomically setCat)
atomically reset'
atomically reset')
When run it produces:
$ ghc --make test.hs -threaded -rtsopts
$ ./test +RTS -N2
Oh Noes!
test: thread blocked indefinitely in an STM transaction
The second message is just a consequence of entering an unexpected state. The first message indicates that both the transactions cat
and dog
committed at the same time.
It does this for HEAD and 7.6.
I've sketched out an interleaving that leads to this. TRec entries are in the first and third column and TVar's are in the second column. Each entry has a TVar name and the expected value followed by the new value and then a number of updates if it has been read. TVars list their value and their number of updates.
A TRec TVar B TRec
-- Transactions start
cat F F cat F 0 cat F F -- Initial reads.
dog F F dog F 0 dog F F
cat F T dog F T -- Local writes in TRec's
-- Validation:
cat F F 0 -- B reads num_updates from cat (with
^ -- consistency check with value)
cat F T cat A 0 | -- A acquires lock for cat (atomic cas)
dog F F 0 ^ | -- A reads num_updates from dog (with
^ | | -- consistency check with value)
| dog B 0 dog F T | -- B acquires lock for dog (atomic cas)
| | ^ |
| | | |
Success 0 | 0 | -- read check for A
Success 0 0 -- read check for B
cat A 1 -- Increment version
cat T 1 -- Unlock with new value
dog B 1 -- Increment version
dog B T -- Unlock with new value
What is clear here is that the version number is not enough to check
in check_read_only
because there is a gap between locking and
incrementing the version. We need to know atomically that the TVar is not locked and it's version number is the same.
I need to read through the right parts of Keir Fraser's thesis carefully, but it seems like the read phase here is only helpful in preventing a commit that writes back the exact value we have already seen while we are in the middle of committing.
The code for check_read_only
is here:
static StgBool check_read_only(StgTRecHeader *trec STG_UNUSED) {
StgBool result = TRUE;
ASSERT (config_use_read_phase);
IF_STM_FG_LOCKS({
FOR_EACH_ENTRY(trec, e, {
StgTVar *s;
s = e -> tvar;
if (entry_is_read_only(e)) {
TRACE("%p : check_read_only for TVar %p, saw %ld", trec, s, e -> num_updates);
if (s -> num_updates != e -> num_updates) {
// ||s -> current_value != e -> expected_value) {
TRACE("%p : mismatch", trec);
result = FALSE;
BREAK_FOR_EACH;
}
}
});
});
return result;
}
If I restore the commented out line (which appears commented out in the first commit of this code that I can find) I can't reproduce the issue, but I think there is still a problem due to the ordering of those checks: we could observe the version as the same, while it is locked, have the TVar unlock, then observe the value the same.
Switching the order we can only observe the TVar unlocked if the update has been incremented (as long as we are on an architecture that ensures this such as x86).
Does this seem right? One issue is that given the interleaving that above with this added check both transactions could fail to commit. I think the algorithms from Fraser avoid this, but I think it always involves invalidating another transaction (i.e. killing off any other transactions observed to be in the read check phase with an overlapping TVar in a way that results in only one winner (see page 21 section 4.4 in Concurrent Programming Without Locks)).
As a side note, the issue can be avoided by ensuring that the reads become writes and avoiding the read only check. But you can only become a write if the values do not have matching pointers, switching to Ints instead of Bools gets you there.
Trac metadata
Trac field | Value |
---|---|
Version | 7.7 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Runtime System |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |