Memory effects in doomed STM transactions
Problem
GHC's STM implementation does "lazy" validation meaning that speculative execution can continue even after observing an inconsistent view of memory. Transactions in the state are called "doomed transactions". While the runtime tries to avoids some effects in doomed transactions, such as entering an infinite loop, it does not successfully control all memory effects. In particular there are many (I do not have an exhaustive list) allocations that eventually end up in the allocate
function which is nicely documented with the following:
allocate()
never fails; if it returns, the returned value is a valid address. If the nursery is already full, then another block is allocated from the global block pool. If we need to get memory from the OS and that operation fails, then the whole process will be killed.
https://github.com/ghc/ghc/blob/master/rts/sm/Storage.c#L819
A doomed transaction that demands a large allocation that the OS cannot fulfill will kill the process. Here is a program that reliably fails for me:
-- oom.hs
{-# LANGUAGE BangPatterns #-}
import GHC.Conc
import Control.Concurrent
import qualified Data.ByteString as B
forever act = act >> forever act
check True = return ()
check False = retry
main = do
tx <- newTVarIO 0
ty <- newTVarIO 1
tz <- newTVarIO 0
done <- newTVarIO False
_ <- forkIO $ forever $ atomically $ do
-- Only read tx and ty
x <- readTVar tx
y <- readTVar ty
if x > y -- This should always be False
then do
-- We only get here in a doomed transaction. Commenting out the next two lines
-- but leaving the write to done and the program runs as expected because the
-- doomed transaction is detected at commit time.
let !big = B.length $ B.replicate maxBound 0 -- The big allocation!
writeTVar tz big
writeTVar done True
else return ()
let mut = forever $ atomically $ do
y <- readTVar ty
x <- readTVar tx
if x > 1000
then do
-- When we get big enough, start over.
writeTVar tx 0
writeTVar ty 1 -- Always holds semantically that tx < ty
else do
-- increment x and y both
writeTVar ty (succ y)
writeTVar tx (succ x) -- tx < ty
-- Give lots of opportunities to witness inconsistent memory.
_ <- forkIO mut
_ <- forkIO mut
_ <- forkIO mut
atomically $ readTVar done >>= check
putStrLn "Done"
Running leads to out of memory:
> ghc oom.hs -fno-omit-yields -threaded
[1 of 1] Compiling Main ( oom.hs, oom.o )
Linking oom.exe ...
> ./oom.exe +RTS -N
oom.exe: out of memory
Fixing
When a potentially bad memory effect is about to happen in some thread, we need to ensure that we validate any running transaction and if it fails have some way to back out of the operation and abort the transaction. This might be tricky on two fronts 1) finding all the critical allocations and 2) finding places where we can both detect (1) and abort the transaction. The places I'm concerned about most are array allocation such as the ByteString
in the example and maybe Integer
allocation. Another fix is a different STM implementation altogether that (at a performance cost or trade-off) doesn't allow doomed transactions (to appear Haskell Symposium 2016 :D). I think we can at least address this particular example without going so far.
A related issue is more general memory leaks from doomed transactions. Consider a program with a memoized Fibonacci function. The semantics of the transactions written by the programmer may ensure that no value higher then 10 is ever demanded of that function, yet in a doomed transaction the invariant goes wrong and the program demands 100. The program will detect the doomed transaction eventually in this case, but not before it has allocated a large live blob never to be touched again.
Trac metadata
Trac field | Value |
---|---|
Version | 8.0.1 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Runtime System |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | simonmar |
Operating system | |
Architecture |