Opened 9 years ago

Last modified 13 months ago

#427 new bug (None)

Random.StdGen slowness

Reported by: remit Owned by: rrnewton
Priority: normal Milestone:
Component: libraries/random Version:
Keywords: Cc: dons@…, rturk@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description (last modified by simonmar)

Two (performance) problems in one:

{-# OPTIONS -fffi #-}
module Main (main) where

import Control.Monad
import Random

foreign import ccall unsafe "random" _crandom :: IO Int

randomInt :: (Int, Int) -> IO Int
randomInt (min,max) = do
    n <- _crandom
    return $ min + n `rem` range
    where
        range = max - min + 1

main = replicateM_ (5*10^6) $ do
    x <- randomRIO (0::Int,1000) :: IO Int
    x `seq` return ()
    return ()

First, without the "seq" at the end, hardly anything is
evaluated and we're building huge amounts of thunks.
Three ideas about this one:
- Blame the user :)
- data StdGen = StdGen !Int !Int
  Use strict fields in StdGen. Doesn't actually help
(at least in this example).
- Force evaluation of the StdGen in getStdRandom.
  Does help in this example, but also changes behaviour
of the library:
  x <- randomRIO undefined
  currently dies only when x (or the result of a later
randomRIO) is evaluated. This change causes it to die
immediately.

Second, even _with_ the "seq", replacing "randomRIO" by
"randomInt" speeds the thing up with a factor of about
30. (2 to 3.6, in a "real world" university practicum
exercise of 900 lines of code)
Even given the fact that they're not really doing the
same thing, this seems rather much :(

Change History (21)

comment:1 Changed 9 years ago by simonpj

Logged In: YES 
user_id=50165

If someone can narrow down why things speed up by a factor 
of 30, and can give us a better characterised case where 
GHC is giving bad performance, we'd be happy to investigate.

comment:2 Changed 8 years ago by simonmar

  • Architecture set to Unknown
  • Description modified (diff)
  • Difficulty set to Unknown
  • Operating System set to Unknown
  • Version changed from None to 6.4.1

comment:3 Changed 8 years ago by igloo

  • Milestone set to 6.6.1

comment:4 Changed 7 years ago by igloo

  • Milestone changed from 6.6.1 to _|_

It would be great if someone was interested in taking a look at the algorithms used by the Random library and making any improvements necessary.

comment:5 Changed 6 years ago by simonmar

  • Component changed from libraries/base to libraries/random
  • Owner nobody deleted
  • Status changed from assigned to new

comment:6 Changed 6 years ago by dons

  • Cc dons@… added

The mersenne twister C lib is also some 60x faster. There's a binding to it here:

http://hackage.haskell.org/cgi-bin/hackage-scripts/package/mersenne-random-0.1

comment:7 Changed 6 years ago by igloo

See also #2280

comment:8 Changed 6 years ago by simonmar

  • Architecture changed from Unknown to Unknown/Multiple

comment:9 Changed 6 years ago by simonmar

  • Operating System changed from Unknown to Unknown/Multiple

comment:10 Changed 5 years ago by Remi

  • Cc rturk@… added

comment:11 follow-up: Changed 5 years ago by duncan

See also http://haskell.org/pipermail/libraries/2009-March/011390.html

It looks like just adding a bit of strictness to the StdGen constructor and to getStdRandom would improve things without much fuss. Probably needs 10 min to investigate the performance difference.

comment:12 in reply to: ↑ 11 Changed 5 years ago by dons

Replying to duncan:

See also http://haskell.org/pipermail/libraries/2009-March/011390.html

It looks like just adding a bit of strictness to the StdGen constructor and to getStdRandom would improve things without much fuss. Probably needs 10 min to investigate the performance difference.

Adding {-#UNPACK #-}!Int32 didn't help much. I think the problem may still be all the conversions via Integer.

Here, for example, compared against mersenne-random:

Generating 26452582 Ints with System.Random:
Computation time: 47.850 sec

Generating 26452582 Ints with System.Random.Mersenne:
Computation time: 0.267 sec

Generaating 26452582 Double's with System.Random:
Computation time: 20.632 sec

And with System.Random.Mersenne:
Computation time: 0.563 sec

comment:13 Changed 5 years ago by Remi

The slowness is most probably due to all the Integer conversions or perhaps also the algorithm used. (It's been years ago and I'm definitely not an expert on PRNGs.)
However, IIRC making getStdRandom more strict does solve the thunk-bomb problem.

comment:14 Changed 5 years ago by dons

  • Version 6.4.1 deleted

Patch attached. Someone should validate it.

http://galois.com/~dons/tmp/strict-state.dpatch

comment:15 follow-up: Changed 5 years ago by duncan

I don't think that patch is enough to fix the "thunk-bomb" problem (though I think it is also necessary). It also needs getStdRandom to seq the gen state when updating the IORef. Mind you, I've not actually tested it.

comment:16 in reply to: ↑ 15 ; follow-up: Changed 5 years ago by dons

Replying to duncan:

I don't think that patch is enough to fix the "thunk-bomb" problem (though I think it is also necessary). It also needs getStdRandom to seq the gen state when updating the IORef. Mind you, I've not actually tested it.

http://galois.com/~dons/tmp/seq-strict-random.dpatch

Needs someone to test it now :-)

comment:17 in reply to: ↑ 16 Changed 5 years ago by Remi

Replying to dons:

http://galois.com/~dons/tmp/seq-strict-random.dpatch

Needs someone to test it now :-)

I haven't done any benchmarking, but it doesn't fix the thunk-bomb:
All getStdRandom does is call atomicModifyIORef, and when its result is not forced, that'll just atomically store a thunk... Perhaps it's time for atomicModifyIORef'? (That is, strict but still both atomic and fast.)
What does work for the thunk-bomb problem is simply

- getStdRandom f = atomicModifyIORef theStdGen (swap . f)
+ getStdRandom f = atomicModifyIORef theStdGen (swap . f) >>= (return $!)

That won't make it faster though.

By the way, is anyone else interested in getting the current Random redesigned?

comment:18 Changed 4 years ago by simonmar

  • Priority changed from low to normal
  • Type of failure set to Runtime performance bug

See also bos's statistics package which includes a fast random number generator.

comment:19 Changed 3 years ago by rrnewton

  • Owner set to rrnewton

comment:20 Changed 13 months ago by joell

This ticket seems to be almost describing two separate issues: the performance of the random package and the "thunk-bomb" problem in getStdRandom. While I admit I haven't looked into the performance aspect, I am now well acquainted with the "thunk-bomb" aspect.

Turns out a metrics library we were using at my company made a randomRIO call that in some cases (all of them in our configuration) wouldn't actually use the generated random value. This led to me spending a few days trying to figure out why our services were consuming and retaining gigabytes of memory. (For those interested in case studies, the discussion with the metrics library folks is here: https://github.com/brendanhay/network-metrics/issues/4 .)

So I would like to petition that the change Remi suggested in comment:17 to make getStdRandom strict (or something along those lines) be made. To my mind (heavily biased by the experience of the last few days), I wouldn't really consider the "thunk-bomb" aspect a performance problem so much as a stability one.

comment:21 Changed 13 months ago by simonpj

Would someone like to look at this, please? It sounds as if most of the thinking is done; it just needs someone to take an hour to read the thread, think it through, make the change, test it carefully, send a patch.

Thanks

Simon

Note: See TracTickets for help on using tickets.