GHC: Ticket #5663: integer-gmp passes arguments to __gmpn_gcd_1() in contravention of the docs
http://ghc.haskell.org/trac/ghc/ticket/5663
<p>
I have encountered this issue on GHC HEAD, but it looks like it's been the case for a long time: The implementation of gcdInt#, which is used by gcd for Int#, some other integer types, and Integer when both parameters use the S# "small" constructor, is a wrapper around <tt>__gmpn_gcd_1()</tt>. The GMP documentation for <tt>mpn_gcd()</tt> states some input preconditions for its inputs, and the documentation for <tt>mpn_gcd_1()</tt> comes right after it and is very terse, implying that the same preconditions are assumed. We do in fact guarantee most of those preconditions, in particular that the first input is the larger one, but it specifies that the second input must be odd, and we don't check that. (I'm not quite clear on what we're supposed to do if it isn't, but presumably there's some simple algorithm ...)
</p>
<p>
This is not just a theoretical problem. GMP has multiple implementations of its algorithms, for multiple architectures, and no doubt each one has slightly different characteristics; I have tested on an x86-64 system, which uses the implementation of mpn_gcd_1() in the file libraries/integer-gmp/gmp/gmpbuild/mpn/x86_64/gcd_1.asm. I don't really want to make sense of the assembly language and figure out why it misbehaves, but we're violating the documentation and the entire mpn_* suite of functions is considered "internal and difficult to use correctly", so we are the ones who need to change, not them.
</p>
<p>
The problem can be seen as follows:
</p>
<pre class="wiki">GHCi, version 7.3.20111124: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude> import GHC.Integer.GMP.Internals
Prelude GHC.Integer.GMP.Internals> import GHC.Types
Prelude GHC.Integer.GMP.Internals GHC.Types GHC.Int> import GHC.Integer.GMP.Prim
Prelude GHC.Integer.GMP.Internals GHC.Types GHC.Int GHC.Integer.GMP.Prim> I# (gcdInt# 0x2B992DDFA23249D6# 0xDE0B6B3A7640000#)
0
</pre><p>
This should reproduce on any x86-64 system, although I don't have access to another one to test that. In my setup, I'm using the in-tree GMP, which is GMP 5.0.2 as of this writing.
</p>
<p>
To elicit this misbehavior without using an internal function is a little harder. It comes up on my system when attempting to build the GHC haddocks, trying to compute a Rational approximation of pi (that process is where I discovered the above magic constants). But it will not reproduce easily for other people. I am still investigating why not, but I suspect that cross-module dictionary-function unfolding, which is implemented with built-in rewrite rules, one for each class, is broken and has not worked on any platform for at least a couple months. This in turn prevents other rewrite rules from firing; constant folding of arithmetic operations can't happen because those rules are written against primops, not against the methods of Num. On investigation, I have found that some compilation settings (profiling, dynamic linking) correctly include the definition, and not just the type, of dictionary functions in interface files, but vanilla compilation does not. I note that a comment in compiler/main/TidyPgm.lhs claims that DFun unfoldings are communicated to the code in compiler/iface/MkIface.lhs by placing them in the binding; but the bindings are part of the CgGuts, which are not passed to mkIface at all (only the ModGuts are). So there needs to be a new mechanism to pass this information, and I'm not sure what it should be, or why it does get included with some settings. I'll file a separate ticket on this when I have more facts, and not just speculation; I mention it here only because it interacts with the gcd bug, both masking it and evoking it in different situations.
</p>
<p>
The upshot: Anyone seeking to reproduce the behavior will find it does not arise if they invoke "gcd 3141592653589793238 1000000000000000000 :: Integer", because this gets folded at compile-time by a rule that DOES still work. Invoking "gcdInteger 3141592653589793238 1000000000000000000" directly will trigger the bug from the interpreter, but not when compiled, because there is a non-builtin rewrite rule on gcdInteger (not, you will observe, a dictionary function, and hence not affected by the other problem) which folds it. However, because of the aforementioned brokenness of constant folding on arithmetic, I expect that the behavior can be reproduced on all x86-64 platforms, both interpreted and compiled, by invoking "gcdInteger (3141592653589793238 * 1) (1 * 1000000000000000000)".
</p>
<p>
The code that needs changes for the gcd issue can be found in libraries/integer-gmp/cbits/gmp-wrappers.cmm.
</p>
en-usGHChttp://ghc.haskell.org/trac/ghc/chrome/site/ghc_logo.png
http://ghc.haskell.org/trac/ghc/ticket/5663
Trac 1.0.7iglooSun, 25 Dec 2011 13:11:40 GMTstatus, description changed; difficulty, resolution set
http://ghc.haskell.org/trac/ghc/ticket/5663#comment:1
http://ghc.haskell.org/trac/ghc/ticket/5663#comment:1
<ul>
<li><strong>status</strong>
changed from <em>new</em> to <em>closed</em>
</li>
<li><strong>difficulty</strong>
set to <em>Unknown</em>
</li>
<li><strong>resolution</strong>
set to <em>worksforme</em>
</li>
<li><strong>description</strong>
modified (<a href="/trac/ghc/ticket/5663?action=diff&version=1">diff</a>)
</li>
</ul>
<p>
I can't reproduce this either on amd64/Linux using Debian's GMP 2:4.3.2+dfsg-1:
</p>
<pre class="wiki">GHCi, version 7.5.20111219: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude> :m + GHC.Integer.GMP.Internals GHC.Types GHC.Int GHC.Integer.GMP.Prim
Prelude GHC.Integer.GMP.Internals GHC.Types GHC.Int GHC.Integer.GMP.Prim> :set -XMagicHash
Prelude GHC.Integer.GMP.Internals GHC.Types GHC.Int GHC.Integer.GMP.Prim> I# (gcdInt# 0x2B992DDFA23249D6# 0xDE0B6B3A7640000#)
2
</pre><p>
nor using the in-tree GMP on amd64 OS X.
</p>
<p>
<a class="ext-link" href="http://gmplib.org/manual/Low_002dlevel-Functions.html"><span class="icon"></span>http://gmplib.org/manual/Low_002dlevel-Functions.html</a> says
</p>
<pre class="wiki">— Function: mp_size_t mpn_gcd (mp_limb_t *rp, mp_limb_t *xp, mp_size_t xn, mp_limb_t *yp, mp_size_t yn)
Set {rp, retval} to the greatest common divisor of {xp, xn} and {yp, yn}. The result can be up to yn limbs, the return value is the actual number produced. Both source operands are destroyed.
{xp, xn} must have at least as many bits as {yp, yn}. {yp, yn} must be odd. Both operands must have non-zero most significant limbs. No overlap is permitted between {xp, xn} and {yp, yn}.
— Function: mp_limb_t mpn_gcd_1 (const mp_limb_t *xp, mp_size_t xn, mp_limb_t ylimb)
Return the greatest common divisor of {xp, xn} and ylimb. Both operands must be non-zero.
</pre><p>
My reading of that is that the only pre-condition for <tt>mpn_gcd_1</tt> is that both operands must be non-zero.
</p>
<p>
Like you say, as far as I know no-one else has run into this, so I think the problem lies elsewhere.
</p>
Ticket