GHC: Ticket #3575: mkStdGen and split conspire to make some programs predictable
http://ghc.haskell.org/trac/ghc/ticket/3575
<p>
The following program <tt>random.hs</tt> is intended to produce a line containing 30 random 0s or 1s. It is not an example of the best way to use System.Random, but it looks innocuous enough.
</p>
<pre class="wiki">import Control.Monad
import System.Random
print0or1 = do
g <- newStdGen
putStr . show . fst $ randomR (0, 1 :: Int) g
main = replicateM_ 30 print0or1 >> putStrLn ""
</pre><p>
Let's try running it a thousand times:
</p>
<pre class="wiki">rwbarton@functor:/tmp$ ghc-6.10.1 -O2 --make random
[1 of 1] Compiling Main ( random.hs, random.o )
Linking random ...
rwbarton@functor:/tmp$ for i in `seq 1 1000` ; do ./random >> random.out ; done
rwbarton@functor:/tmp$ sort random.out | uniq | wc -l
60
</pre><p>
That's odd... there are 2<sup>30</sup> possible output lines, but when I tried to generate 1000 random ones, I only got 60 distinct outputs. Why did that happen?
</p>
<p>
One might think this is due to poor initial seeding of the random number generator (due to the time not changing very much during the test), but this is not the case. Attached is a fancier version of the program which reads an initial seed from <tt>/dev/urandom</tt>; it exhibits the same behavior.
</p>
<p>
This phenomenon is not too hard to explain. It is ultimately due to a poor interaction between <tt>mkStdGen</tt> and <tt>split</tt>. First, we need to know a bit about the design of System.Random (some statements simplified slightly for this discussion).
</p>
<ul><li>The state of the RNG consists of two <tt>Int32</tt>s, <tt>s1</tt> and <tt>s2</tt>.
</li></ul><ul><li>The initial state produced by mkStdGen almost always has <tt>s2</tt> equal to 1. (Extremely rarely, it might have <tt>s2</tt> equal to 2. We'll ignore this as it doesn't affect the argument.)
</li></ul><ul><li>To generate a random 0 or 1, we first generate a new state using some simple functions <tt>s1' = next1(s1)</tt>, <tt>s2' = next2(s2)</tt>. (Note that <tt>s1</tt> and <tt>s2</tt> "evolve" independently.) The random value returned is the lowest bit of <tt>s1'</tt> minus <tt>s2'</tt>.
</li></ul><ul><li>Splitting the generator <tt>(s1, s2)</tt> yields the two generators <tt>(s1+1, next2(s2))</tt> and <tt>(next1(s1), s2-1)</tt>.
</li></ul><p>
Our program functions as follows.
</p>
<ul><li>Initialize the generator stored in <tt>theStdGen</tt> (<tt>s1</tt> is some varying value <tt>a</tt>, <tt>s2</tt> is 1).
</li></ul><ul><li>Repeatedly split the generator, replacing it with the first output, and use the second output to generate a 0 or 1.
</li></ul><p>
If we watch <tt>theStdGen</tt> while our program runs, we will see that <tt>s1</tt> is incremented by 1 at each step, while <tt>s2</tt> follows the fixed sequence <tt>1</tt>, <tt>next2(1)</tt>, <tt>next2(next2(1))</tt>, etc. The 0 or 1 we output at the <tt>k</tt>th step is thus the lowest bit of <tt>next1(next1(a+k-1))</tt> minus <tt>b</tt><sub>k</sub>, where <tt>b</tt><sub>k</sub> is some fixed sequence. And as <tt>k</tt> varies, <tt>next1(next1(a+k-1))</tt> turns out to be just an arithmetic sequence with fixed difference modulo a fixed prime so its lowest bits are extremely predictable even without knowing <tt>a</tt>.
</p>
<p>
This issue can be fixed to some extent, without breaking backwards compatibility, by adding another method (besides <tt>mkStdGen</tt>) to create a generator, which does not have predictable <tt>s2</tt>, and using it to initialize the system RNG.
</p>
en-usGHChttp://ghc.haskell.org/trac/ghc/chrome/site/ghc_logo.png
http://ghc.haskell.org/trac/ghc/ticket/3575
Trac 1.0.9rwbartonMon, 12 Oct 2009 04:52:16 GMTattachment set
http://ghc.haskell.org/trac/ghc/ticket/3575
http://ghc.haskell.org/trac/ghc/ticket/3575
<ul>
<li><strong>attachment</strong>
set to <em>random-seed.hs</em>
</li>
</ul>
<p>
Alternative example program which reads a seed from /dev/urandom
</p>
TicketrwbartonMon, 12 Oct 2009 04:55:54 GMT
http://ghc.haskell.org/trac/ghc/ticket/3575#comment:1
http://ghc.haskell.org/trac/ghc/ticket/3575#comment:1
<p>
Thanks to Michael Rice for suggesting that there may be a problem: <a class="ext-link" href="http://www.haskell.org/pipermail/haskell-cafe/2009-October/067655.html"><span class="icon"></span>http://www.haskell.org/pipermail/haskell-cafe/2009-October/067655.html</a>
</p>
TicketguestMon, 12 Oct 2009 05:05:13 GMT
http://ghc.haskell.org/trac/ghc/ticket/3575#comment:2
http://ghc.haskell.org/trac/ghc/ticket/3575#comment:2
<p>
Thread in one long page <a class="ext-link" href="http://www.nabble.com/Simple-program.-Simple-problem--td25848131.html"><span class="icon"></span>here</a>
</p>
TicketsimonpjWed, 14 Oct 2009 08:33:08 GMTdifficulty set
http://ghc.haskell.org/trac/ghc/ticket/3575#comment:3
http://ghc.haskell.org/trac/ghc/ticket/3575#comment:3
<ul>
<li><strong>difficulty</strong>
set to <em>Unknown</em>
</li>
</ul>
<p>
Another thread here
<a class="ext-link" href="http://article.gmane.org/gmane.comp.lang.haskell.cafe/64741/"><span class="icon"></span>http://article.gmane.org/gmane.comp.lang.haskell.cafe/64741/</a>
</p>
<p>
It concerns <tt>Control.Monad.Random</tt> rather than <tt>System.Random</tt>, but <tt>Control.Monad.Random</tt> in turn may depend on the same underlying infrastructure; anyway it looked related.
</p>
TicketsimonpjWed, 14 Oct 2009 08:35:45 GMT
http://ghc.haskell.org/trac/ghc/ticket/3575#comment:4
http://ghc.haskell.org/trac/ghc/ticket/3575#comment:4
<p>
Fantastic bug report from Reid Barton! Thanks.
</p>
<p>
His report, and perhaps the other thread above, identifies serious shortcomings in <tt>System.Random</tt>. Does anyone feel able to help fix them? I think it's unlikely that GHC HQ will; we're snowed under, and we have zero expertise in random number generators.
</p>
<p>
Thanks
</p>
<p>
Simon
</p>
TicketsimonpjWed, 28 Oct 2009 15:58:14 GMT
http://ghc.haskell.org/trac/ghc/ticket/3575#comment:5
http://ghc.haskell.org/trac/ghc/ticket/3575#comment:5
<p>
See also <a class="closed ticket" href="http://ghc.haskell.org/trac/ghc/ticket/3620" title="bug: The seeds generated by split are not independent (closed: duplicate)">#3620</a>
</p>
TicketiglooWed, 28 Oct 2009 21:28:39 GMTmilestone set
http://ghc.haskell.org/trac/ghc/ticket/3575#comment:6
http://ghc.haskell.org/trac/ghc/ticket/3575#comment:6
<ul>
<li><strong>milestone</strong>
set to <em>_|_</em>
</li>
</ul>
TicketrrnewtonMon, 27 Jun 2011 01:28:10 GMTowner, failure set
http://ghc.haskell.org/trac/ghc/ticket/3575#comment:7
http://ghc.haskell.org/trac/ghc/ticket/3575#comment:7
<ul>
<li><strong>owner</strong>
set to <em>rrnewton</em>
</li>
<li><strong>failure</strong>
set to <em>None/Unknown</em>
</li>
</ul>
TicketnfrisbyWed, 06 Nov 2013 19:59:35 GMT
http://ghc.haskell.org/trac/ghc/ticket/3575#comment:8
http://ghc.haskell.org/trac/ghc/ticket/3575#comment:8
<p>
Are there any plans to adopt the work out of Chalmers from the tf-random package on Hackage into System.Random?
</p>
TicketthoughtpoliceTue, 07 Oct 2014 05:50:46 GMTowner, component changed
http://ghc.haskell.org/trac/ghc/ticket/3575#comment:9
http://ghc.haskell.org/trac/ghc/ticket/3575#comment:9
<ul>
<li><strong>owner</strong>
changed from <em>rrnewton</em> to <em>ekmett</em>
</li>
<li><strong>component</strong>
changed from <em>libraries/random</em> to <em>Core Libraries</em>
</li>
</ul>
<p>
Moving over to new owning component 'Core Libraries'.
</p>
TicketthomieWed, 05 Aug 2015 16:02:15 GMTstatus changed; resolution set
http://ghc.haskell.org/trac/ghc/ticket/3575#comment:10
http://ghc.haskell.org/trac/ghc/ticket/3575#comment:10
<ul>
<li><strong>status</strong>
changed from <em>new</em> to <em>closed</em>
</li>
<li><strong>resolution</strong>
set to <em>duplicate</em>
</li>
</ul>
<p>
This is now being tracked in the <tt>random</tt> bug tracker on Github: <a class="ext-link" href="https://github.com/haskell/random/issues/25"><span class="icon"></span>The seeds generated by split are not independent</a>.
</p>
Ticket