Opened 10 years ago

Closed 4 weeks ago

#427 closed bug (duplicate)

Random.StdGen slowness

Reported by: remit Owned by: ekmett
Priority: normal Milestone:
Component: Core Libraries Version:
Keywords: Cc: dons@…, rturk@…, rrnewton
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Revisions:

Description (last modified by simonmar)

Two (performance) problems in one:

{-# OPTIONS -fffi #-}
module Main (main) where

import Control.Monad
import Random

foreign import ccall unsafe "random" _crandom :: IO Int

randomInt :: (Int, Int) -> IO Int
randomInt (min,max) = do
    n <- _crandom
    return $ min + n `rem` range
    where
        range = max - min + 1

main = replicateM_ (5*10^6) $ do
    x <- randomRIO (0::Int,1000) :: IO Int
    x `seq` return ()
    return ()

First, without the "seq" at the end, hardly anything is
evaluated and we're building huge amounts of thunks.
Three ideas about this one:
- Blame the user :)
- data StdGen = StdGen !Int !Int
  Use strict fields in StdGen. Doesn't actually help
(at least in this example).
- Force evaluation of the StdGen in getStdRandom.
  Does help in this example, but also changes behaviour
of the library:
  x <- randomRIO undefined
  currently dies only when x (or the result of a later
randomRIO) is evaluated. This change causes it to die
immediately.

Second, even _with_ the "seq", replacing "randomRIO" by
"randomInt" speeds the thing up with a factor of about
30. (2 to 3.6, in a "real world" university practicum
exercise of 900 lines of code)
Even given the fact that they're not really doing the
same thing, this seems rather much :(

Change History (25)

comment:1 Changed 10 years ago by simonpj

Logged In: YES 
user_id=50165

If someone can narrow down why things speed up by a factor 
of 30, and can give us a better characterised case where 
GHC is giving bad performance, we'd be happy to investigate.

comment:2 Changed 10 years ago by simonmar

  • Architecture set to Unknown
  • Description modified (diff)
  • difficulty set to Unknown
  • Operating System set to Unknown
  • Version changed from None to 6.4.1

comment:3 Changed 9 years ago by igloo

  • Milestone set to 6.6.1

comment:4 Changed 8 years ago by igloo

  • Milestone changed from 6.6.1 to _|_

It would be great if someone was interested in taking a look at the algorithms used by the Random library and making any improvements necessary.

comment:5 Changed 8 years ago by simonmar

  • Component changed from libraries/base to libraries/random
  • Owner nobody deleted
  • Status changed from assigned to new

comment:6 Changed 8 years ago by dons

  • Cc dons@… added

The mersenne twister C lib is also some 60x faster. There's a binding to it here:

http://hackage.haskell.org/cgi-bin/hackage-scripts/package/mersenne-random-0.1

comment:7 Changed 7 years ago by igloo

See also #2280

comment:8 Changed 7 years ago by simonmar

  • Architecture changed from Unknown to Unknown/Multiple

comment:9 Changed 7 years ago by simonmar

  • Operating System changed from Unknown to Unknown/Multiple

comment:10 Changed 7 years ago by Remi

  • Cc rturk@… added

comment:11 follow-up: Changed 7 years ago by duncan

See also http://haskell.org/pipermail/libraries/2009-March/011390.html

It looks like just adding a bit of strictness to the StdGen constructor and to getStdRandom would improve things without much fuss. Probably needs 10 min to investigate the performance difference.

comment:12 in reply to: ↑ 11 Changed 7 years ago by dons

Replying to duncan:

See also http://haskell.org/pipermail/libraries/2009-March/011390.html

It looks like just adding a bit of strictness to the StdGen constructor and to getStdRandom would improve things without much fuss. Probably needs 10 min to investigate the performance difference.

Adding {-#UNPACK #-}!Int32 didn't help much. I think the problem may still be all the conversions via Integer.

Here, for example, compared against mersenne-random:

Generating 26452582 Ints with System.Random:
Computation time: 47.850 sec

Generating 26452582 Ints with System.Random.Mersenne:
Computation time: 0.267 sec

Generaating 26452582 Double's with System.Random:
Computation time: 20.632 sec

And with System.Random.Mersenne:
Computation time: 0.563 sec

comment:13 Changed 7 years ago by Remi

The slowness is most probably due to all the Integer conversions or perhaps also the algorithm used. (It's been years ago and I'm definitely not an expert on PRNGs.) However, IIRC making getStdRandom more strict does solve the thunk-bomb problem.

comment:14 Changed 7 years ago by dons

  • Version 6.4.1 deleted

Patch attached. Someone should validate it.

http://galois.com/~dons/tmp/strict-state.dpatch

comment:15 follow-up: Changed 7 years ago by duncan

I don't think that patch is enough to fix the "thunk-bomb" problem (though I think it is also necessary). It also needs getStdRandom to seq the gen state when updating the IORef. Mind you, I've not actually tested it.

comment:16 in reply to: ↑ 15 ; follow-up: Changed 7 years ago by dons

Replying to duncan:

I don't think that patch is enough to fix the "thunk-bomb" problem (though I think it is also necessary). It also needs getStdRandom to seq the gen state when updating the IORef. Mind you, I've not actually tested it.

http://galois.com/~dons/tmp/seq-strict-random.dpatch

Needs someone to test it now :-)

comment:17 in reply to: ↑ 16 Changed 6 years ago by Remi

Replying to dons:

http://galois.com/~dons/tmp/seq-strict-random.dpatch

Needs someone to test it now :-)

I haven't done any benchmarking, but it doesn't fix the thunk-bomb: All getStdRandom does is call atomicModifyIORef, and when its result is not forced, that'll just atomically store a thunk... Perhaps it's time for atomicModifyIORef'? (That is, strict but still both atomic and fast.) What does work for the thunk-bomb problem is simply

- getStdRandom f = atomicModifyIORef theStdGen (swap . f)
+ getStdRandom f = atomicModifyIORef theStdGen (swap . f) >>= (return $!)

That won't make it faster though.

By the way, is anyone else interested in getting the current Random redesigned?

comment:18 Changed 6 years ago by simonmar

  • Priority changed from low to normal
  • Type of failure set to Runtime performance bug

See also bos's statistics package which includes a fast random number generator.

comment:19 Changed 4 years ago by rrnewton

  • Owner set to rrnewton

comment:20 Changed 2 years ago by joell

This ticket seems to be almost describing two separate issues: the performance of the random package and the "thunk-bomb" problem in getStdRandom. While I admit I haven't looked into the performance aspect, I am now well acquainted with the "thunk-bomb" aspect.

Turns out a metrics library we were using at my company made a randomRIO call that in some cases (all of them in our configuration) wouldn't actually use the generated random value. This led to me spending a few days trying to figure out why our services were consuming and retaining gigabytes of memory. (For those interested in case studies, the discussion with the metrics library folks is here: https://github.com/brendanhay/network-metrics/issues/4 .)

So I would like to petition that the change Remi suggested in comment:17 to make getStdRandom strict (or something along those lines) be made. To my mind (heavily biased by the experience of the last few days), I wouldn't really consider the "thunk-bomb" aspect a performance problem so much as a stability one.

comment:21 Changed 2 years ago by simonpj

Would someone like to look at this, please? It sounds as if most of the thinking is done; it just needs someone to take an hour to read the thread, think it through, make the change, test it carefully, send a patch.

Thanks

Simon

comment:22 Changed 12 months ago by ekmett

  • Cc rrnewton added

The "thunk-bomb" problem should be fixed in random 1.1.

The redesign question is still, however, open.

comment:23 Changed 12 months ago by ekmett

  • Owner changed from rrnewton to ekmett

comment:24 Changed 11 months ago by thoughtpolice

  • Component changed from libraries/random to Core Libraries

Moving over to new owning component 'Core Libraries'.

comment:25 Changed 4 weeks ago by thomie

  • Resolution changed from None to duplicate
  • Status changed from new to closed

This bug is now being tracked in the random bug tracker on Github: Random.StdGen slowness.

Note: See TracTickets for help on using tickets.