Two (performance) problems in one:{-# OPTIONS -fffi #-}module Main (main) whereimport Control.Monadimport Randomforeign import ccall unsafe "random" _crandom :: IO IntrandomInt :: (Int, Int) -> IO IntrandomInt (min,max) = do n <- _crandom return $ min + n `rem` range where range = max - min + 1main = replicateM_ (5*10^6) $ do x <- randomRIO (0::Int,1000) :: IO Int x `seq` return () return ()First, without the "seq" at the end, hardly anything isevaluated and we're building huge amounts of thunks.Three ideas about this one:- Blame the user :)- data StdGen = StdGen !Int !Int Use strict fields in StdGen. Doesn't actually help(at least in this example).- Force evaluation of the StdGen in getStdRandom. Does help in this example, but also changes behaviourof the library: x <- randomRIO undefined currently dies only when x (or the result of a laterrandomRIO) is evaluated. This change causes it to dieimmediately.Second, even _with_ the "seq", replacing "randomRIO" by"randomInt" speeds the thing up with a factor of about30. (2 to 3.6, in a "real world" university practicumexercise of 900 lines of code)Even given the fact that they're not really doing thesame thing, this seems rather much :(
Edited
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Logged In: YES user_id=50165If someone can narrow down why things speed up by a factor of 30, and can give us a better characterised case where GHC is giving bad performance, we'd be happy to investigate.
It looks like just adding a bit of strictness to the StdGen constructor and to getStdRandom would improve things without much fuss. Probably needs 10 min to investigate the performance difference.
It looks like just adding a bit of strictness to the StdGen constructor and to getStdRandom would improve things without much fuss. Probably needs 10 min to investigate the performance difference.
Adding {-#UNPACK #-}!Int32 didn't help much. I think the problem may still be all the conversions via Integer.
Here, for example, compared against mersenne-random:
Generating 26452582 Ints with System.Random:Computation time: 47.850 secGenerating 26452582 Ints with System.Random.Mersenne:Computation time: 0.267 secGeneraating 26452582 Double's with System.Random:Computation time: 20.632 secAnd with System.Random.Mersenne:Computation time: 0.563 sec
The slowness is most probably due to all the Integer conversions or perhaps also the algorithm used. (It's been years ago and I'm definitely not an expert on PRNGs.)
However, IIRC making getStdRandom more strict does solve the thunk-bomb problem.
I don't think that patch is enough to fix the "thunk-bomb" problem (though I think it is also necessary). It also needs getStdRandom to seq the gen state when updating the IORef. Mind you, I've not actually tested it.
I don't think that patch is enough to fix the "thunk-bomb" problem (though I think it is also necessary). It also needs getStdRandom to seq the gen state when updating the IORef. Mind you, I've not actually tested it.
I haven't done any benchmarking, but it doesn't fix the thunk-bomb:
All getStdRandom does is call atomicModifyIORef, and when its result is not forced, that'll just atomically store a thunk... Perhaps it's time for atomicModifyIORef'? (That is, strict but still both atomic and fast.)
What does work for the thunk-bomb problem is simply
- getStdRandom f = atomicModifyIORef theStdGen (swap . f)+ getStdRandom f = atomicModifyIORef theStdGen (swap . f) >>= (return $!)
That won't make it faster though.
By the way, is anyone else interested in getting the current Random redesigned?
This ticket seems to be almost describing two separate issues: the performance of the random package and the "thunk-bomb" problem in getStdRandom. While I admit I haven't looked into the performance aspect, I am now well acquainted with the "thunk-bomb" aspect.
Turns out a metrics library we were using at my company made a randomRIO call that in some cases (all of them in our configuration) wouldn't actually use the generated random value. This led to me spending a few days trying to figure out why our services were consuming and retaining gigabytes of memory. (For those interested in case studies, the discussion with the metrics library folks is here: https://github.com/brendanhay/network-metrics/issues/4 .)
So I would like to petition that the change Remi suggested in ticket:427#comment:32736 to make getStdRandom strict (or something along those lines) be made. To my mind (heavily biased by the experience of the last few days), I wouldn't really consider the "thunk-bomb" aspect a performance problem so much as a stability one.
Would someone like to look at this, please? It sounds as if most of the thinking is done; it just needs someone to take an hour to read the thread, think it through, make the change, test it carefully, send a patch.