Opened 10 months ago

Closed 5 months ago

Last modified 4 weeks ago

#9281 closed task (fixed)

Rewrite `integer-gmp` to use only non-allocating GMP functions

Reported by: hvr Owned by: hvr
Priority: normal Milestone: 7.10.1
Component: Core Libraries Version:
Keywords: integer-gmp Cc: simonmar, core-libraries-committee@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: #8647 #9818 Differential Revisions: Phab:D82

Description (last modified by hvr)

After some promising results with a proof-of-concept implementation, I'm optimistic we can rewrite integer-gmp to use only non-allocating GMP lib functions without suffering from serious regressions.

If successful, this would

  • allow to avoid the custom GMP allocator hack, and thus
  • avoid issues when linking against other C libraries using GMP,
  • simplify code, as we would perform all heap allocations in Haskell code (and never inside Cmm/C code as its done now),
  • and finally maybe even remove a few more superfluous temporary heap allocations.

see also wiki:Design/IntegerGmp2

Change History (32)

comment:1 Changed 10 months ago by hvr

As a brain-dump, here's the new representation I'm working on (which happens to make it trivial to provide a natural number type Nat):

type GmpLimb = Word -- actually, 'CULong'
type GmpSize = Int  -- actually, 'CLong'

-- | Type representing /pure/ BigNats
--
-- @ByteArray#@ is interpreted as an unboxed array of 'GmpLimb's
--
-- Invariants:
--
--  - ByteArray#'s size is the exact multiple of non-zero limbs (but never empty)
--  - @0@ is represented as a 1-limb (available as 'bnZero')
data BigNat = BN# ByteArray#

-- | Invariant: 'NatJ#' is used iff value doesn't fit in 'NatS#'
data Nat = NatS# {-# UNPACK #-} !GmpLimb
         | NatJ# {-# UNPACK #-} !BigNat

-- | Invariant: 'Jn#'/'Jp#' are used iff value doesn't fit in 'S#'
--
-- NB: less than 4 constructors allows pointer-tagging to work
data Integer = S#  {-# UNPACK #-} !Int
             | Jn# {-# UNPACK #-} !BigNat -- negative
             | Jp# {-# UNPACK #-} !BigNat -- positive

comment:2 Changed 10 months ago by ekmett

Eliminating the need for the GMP custom allocator would be a _huge_ win for interoperability, and would obviate the need for code I've spent months writing.

If you need anything from me just ask.

comment:3 Changed 10 months ago by simonpj

Sounds fantastic to me. Do please document the design so that others do not later accidentally slip back into calling allocating functions!

Simon

comment:4 Changed 9 months ago by hvr

  • Differential Revisions set to Phab:D82
  • Status changed from new to patch

Here's a first draft version. The number of library calls going into GMP (via ltrace -c, which btw is a wonderful tool for that) is essentially the same (in some cases slightly lower, as the new code is a bit more clever in some ways). However, I'm seeing improvements as well as regressions in the Nofib results (see Phab:P6). I still need to figure out what the reason is for the rather visible allocation increase in primetest (and maybe kahan and bernouilli).

It could be this different approach undoes part of #8647, as we can't so easily optimistically allocate temporary single-limbs on the stack (on the bright side, we've got almost no Cmm code anymore...)

Btw, while implementing the code, I would have wanted to write

foreign import ccall unsafe "integer_gmp_mpn_tdiv_q"
  c_mpn_tdiv_q :: MutableByteArray# s -> ByteArray# -> GmpSize# -> ByteArray# -> GmpSize# -> State# s -> State# s

foreign import ccall unsafe "gmp.h __gmpn_divrem_1"
  c_mpn_divrem_1 :: MutableByteArray# s -> GmpSize# -> ByteArray# -> GmpSize# -> GmpLimb# -> State# s -> (# s, Word# #)

However, GHC insisted on having me to write ... -> IO () and ... -> IO Word respectively, thus forcing a boxed/lifted Word even though {-# LANGUAGE UnliftedFFITypes #-} was in effect. Any chance to have unlifted types allowed for IO-like FFI ccalls to reduce the overhead?

comment:5 follow-up: Changed 9 months ago by simonpj

Terrific. It would be good to articulate the goal more explicitly. Is it primarily to improve interop in some way? How does it improve interop? Or is it to do with performance -- unsafe foreign calls are vastly more efficient than safe ones?

Why do integer_gmp_mpn_tdiv_q etc need to be IO-ish at all? Can't they be pure functions?

Simon

comment:6 in reply to: ↑ 5 Changed 9 months ago by hvr

  • Cc simonmar added

Replying to simonpj:

Terrific. It would be good to articulate the goal more explicitly. Is it primarily to improve interop in some way? How does it improve interop? Or is it to do with performance -- unsafe foreign calls are vastly more efficient than safe ones?

Are you referring to allowing unlifted types as return values in non-pure calls?

Why do integer_gmp_mpn_tdiv_q etc need to be IO-ish at all? Can't they be pure functions?

because those operations write into MutableByteArray#

comment:7 Changed 9 months ago by simonpj

What happens if you try

foreign import ccall unsafe "integer_gmp_mpn_tdiv_q"
  c_mpn_tdiv_q :: MutableByteArray# s -> ByteArray# -> GmpSize# -> ByteArray# -> GmpSize# -> State# s -> State# s

with -XUnliftedFFITypes? Answer

    Unacceptable argument type in foreign declaration: State# s

But why is State# s unacceptable? I can think of no good reason.

So I think that it might be as simple as

  • Adding State# to the list of acceptable FFI types (with -XUnliftedFFITypes.
  • Making sure that, since it's a zero-width value, we don't actually pass it to the C function.

The latter step is in the code generator itself; these zero-width values do (and must) survive into Cmm.

If you'd like to tackle this I'm sure Simon and I would help with any bits you don't grok.

I think this would be MUCH simpler than trying to allow types like IO Int#, which has global implications that I don't know how to deal with.

Simon

comment:8 Changed 9 months ago by hvr

fwiw, if you use prim-FFI imports instead of ccall , ... -> State# s -> State# s as well as ... -> State# s -> (# State# s, Word# #) already works. Is this maybe just a matter of replicating what's already done for prim-style FFI/Cmm calls?

Last edited 9 months ago by hvr (previous) (diff)

comment:9 Changed 9 months ago by simonmar

foreign import prim has a different calling convention from foreign import ccall (obviously) and they look different at the Cmm level. The code generation for these two is quite different - the former is a StgPrimCallOp while the latter is a StgFCallOp.

But I think all the machinery to do this already exists, as Simon said. When we foreign-import something with an IO type there's a wrapper around the primitive foreign call to add the IO, but the primitive call itself will have a type of the form State# s -> (# State# s, a #), so we can already codegen these properly. All we need to do is make sure that there is no wrapper, which might already happen automatically if the typechecker is changed to allow State# arguments and State# or (# State# s, a #) results with UnliftedFFITypes.

comment:10 Changed 9 months ago by hvr

Fyi, I've created a separate ticket #9352 to further discuss the State# s/FFI story

comment:11 Changed 8 months ago by Herbert Valerio Riedel <hvr@…>

In 96d04186e00fe2202b9953688492bb370d818bf1/ghc:

Remove obsolete `digitsTyConKey :: Unique`

This became dead with 1e87c0a6e485e1dbef8e9ed19191e54f6cdc54e0
and was probably just missed.

I plan to re-use the freed up `mkPreludeTyConUnique 23` slot soon
for a new `bigNatTyConKey` (as part of the #9281 effort)

comment:12 Changed 7 months ago by Herbert Valerio Riedel <hvr@…>

In b62bd5ecf3be421778e4835010b6b334e95c5a56/ghc:

Implement `decodeDouble_Int64#` primop

The existing `decodeDouble_2Int#` primop is rather inconvenient to use
(and in fact is not even used by `integer-gmp`) as the mantissa is split
into 3 components which would actually fit in an `Int64#` value.

However, `decodeDouble_Int64#` is to be used by the new `integer-gmp2`
re-implementation (see #9281).

Moreover, `decodeDouble_2Int#` performs direct bit-wise operations on the
IEEE representation which can be replaced by a combination of the
portable standard C99 `scalbn(3)` and `frexp(3)` functions.

Differential Revision: https://phabricator.haskell.org/D160

comment:13 Changed 7 months ago by thoughtpolice

  • Component changed from libraries (other) to Core Libraries

Moving over to new owning component 'Core Libraries'.

comment:14 Changed 6 months ago by hvr

  • Cc core-libraries-committee@… added
  • Description modified (diff)

started a wiki-page over at wiki:Design/IntegerGmp2

comment:15 Changed 6 months ago by Herbert Valerio Riedel <hvr@…>

In 0e1f0f7d1682d77c5dbb1d2b36f57037113cf7b4/ghc:

Un-wire `Integer` type (re #9714)

Integer is currently a wired-in type for integer-gmp. This requires
replicating its inner structure in `TysWiredIn`, which makes it much
harder to change Integer to a more complex representation (as
e.g. needed for implementing #9281)

This commit stops `Integer` being a wired-in type, and makes it
known-key type instead, thereby simplifying code notably.

Reviewed By: austin

Differential Revision: https://phabricator.haskell.org/D351

comment:16 Changed 5 months ago by Herbert Valerio Riedel <hvr@…>

In c774b28f76ee4c220f7c1c9fd81585e0e3af0e8a/ghc:

Implement new integer-gmp2 from scratch (re #9281)

This is done as a separate `integer-gmp2` backend library because it
turned out to become a complete rewrite from scratch.

Due to the different (over)allocation scheme and potentially different
accounting (via the new `{shrink,resize}MutableByteArray#` primitives),
some of the nofib benchmarks actually results in increased allocation
numbers (but not necessarily an increase in runtime!).  I believe the
allocation numbers could improve if `{resize,shrink}MutableByteArray#`
could be optimised to reallocate in-place more efficiently.

Here are the more apparent changes in the latest nofib comparision
between `integer-gmp` and `integer-gmp2`:

  ------------------------------------------------------------------
          Program     Size    Allocs   Runtime   Elapsed  TotalMem
  ------------------------------------------------------------------
              ...
       bernouilli    +1.6%    +15.3%     0.132     0.132      0.0%
              ...
     cryptarithm1    -2.2%      0.0%     -9.7%     -9.7%      0.0%
              ...
            fasta    -0.7%     -0.0%    +10.9%    +10.9%      0.0%
              ...
            kahan    +0.6%    +38.9%     0.169     0.169      0.0%
              ...
             lcss    -0.7%     -0.0%     -6.4%     -6.4%      0.0%
              ...
           mandel    +1.6%    +33.6%     0.049     0.049      0.0%
              ...
         pidigits    +0.8%     +8.5%     +3.9%     +3.9%      0.0%
            power    +1.4%    -23.8%    -18.6%    -18.6%    -16.7%
              ...
        primetest    +1.3%    +50.1%     0.085     0.085      0.0%
              ...
              rsa    +1.6%    +53.4%     0.026     0.026      0.0%
              ...
              scs    +1.2%     +6.6%     +6.5%     +6.6%    +14.3%
              ...
           symalg    +1.0%     +9.5%     0.010     0.010      0.0%
              ...
        transform    -0.6%     -0.0%     -5.9%     -5.9%      0.0%
              ...
  ------------------------------------------------------------------
              Min    -2.3%    -23.8%    -18.6%    -18.6%    -16.7%
              Max    +1.6%    +53.4%    +10.9%    +10.9%    +14.3%
   Geometric Mean    -0.3%     +1.9%     -0.8%     -0.8%     +0.0%

(see P35 / https://phabricator.haskell.org/P35 for full report)

By default, `INTEGER_LIBRARY=integer-gmp2` is active now, which results
in the package `integer-gmp-1.0.0.0` being registered in the package db.
The previous `integer-gmp-0.5.1.0` can be restored by setting
`INTEGER_LIBRARY=integer-gmp` (but will probably be removed altogether
for GHC 7.12). In-tree GMP support has been stolen from the old
`integer-gmp` (while unpatching the custom memory-allocators, as well as
forcing `-fPIC`)

A minor hack to `ghc-cabal` was necessary in order to support two different
`integer-gmp` packages (in different folders) with the same package key.

There will be a couple of follow-up commits re-implementing some features
that were dropped to keep D82 minimal, as well as further
clean-ups/improvements.

More information can be found via #9281 and
https://ghc.haskell.org/trac/ghc/wiki/Design/IntegerGmp2

Reviewed By: austin, rwbarton, simonmar

Differential Revision: https://phabricator.haskell.org/D82

comment:17 Changed 5 months ago by Herbert Valerio Riedel <hvr@…>

In 63a9d9387f27e5842bed5946c5f1381f209d2918/ghc:

Fix `integer-gmp2` compilation with GMP 4.x (#9281)

GMP 4.x didn't provide the `mp_bitcnt_t` typedef yet, so we locally
define one if GMP 4.x is detected.

comment:18 Changed 5 months ago by Herbert Valerio Riedel <hvr@…>

In 42244668af6d8c1dd6a2d64af90ed57d8ecd8d88/ghc:

Reimplement im/export primitives for integer-gmp2

The import/export operations were available in `integer-gmp-0.5.1`
already, but need to be reimplemented from scratch for the
`integer-gmp-1.0.0` rewrite.

This also adds a few more operations than were previously available for
use w/ the `BigNat` type (which will be useful for implementing
serialisation for the upcoming `Natural` type)

Specifically, the following operations are (re)added (albeit with
slightly different type-signatures):

 - `sizeInBaseBigNat`
 - `sizeInBaseInteger`
 - `sizeInBaseWord#`

 - `exportBigNatToAddr`
 - `exportIntegerToAddr`
 - `exportWordToAddr`
 - `exportBigNatToMutableByteArray`
 - `exportIntegerToMutableByteArray`
 - `exportWordToMutableByteArray`

 - `importBigNatFromAddr`
 - `importIntegerFromAddr`
 - `importBigNatFromByteArray`
 - `importIntegerFromByteArray`

NOTE: The `integerGmpInternals` test-case is updated but not yet
      re-enabled as it contains tests for other primitives which aren't
      yet reimplemented.

This addresses #9281

Reviewed By: austin, duncan

Differential Revision: https://phabricator.haskell.org/D480

comment:19 Changed 5 months ago by Herbert Valerio Riedel <hvr@…>

In be7fb7e58c70cd9b0a933fb26cd5f2607d6dc4b2/ghc:

integer-gmp2: export `Word`-counterpart of gcdInt

It's trivial for `integer-gmp2` (#9281) to provide it, and it'll be
useful for a future 'Natural'-related commit, as well as providing a
`Word` optimised `gcd`-RULE.

comment:20 Changed 5 months ago by Herbert Valerio Riedel <hvr@…>

In 2b71b355fb1eb559243444f2dc4584e591fddee1/ghc:

Remove reference to `MIN_VERSION_integer_gmp2`

This is slipped in by accident as part of
c774b28f76ee4c220f7c1c9fd81585e0e3af0e8a (re #9281)

comment:21 Changed 5 months ago by Herbert Valerio Riedel <hvr@…>

In a9a0dd34dcdfb7309f57bda88435acca14ec54d5/ghc:

Install `ghc-gmp.h` C include header file (#9281)

This is mostly interesting when using the in-tree GMP, as there's
no way otherwise to access the in-tree `gmp.h` header file after installation.

In case `integer-gmp2` was build against a system-installed GMP library,
`ghc-gmp.h` simply contains `#include <gmp.h>` for convenience.

Reviewed By: ekmett

Differential Revision: https://phabricator.haskell.org/D522

comment:22 Changed 5 months ago by Herbert Valerio Riedel <hvr@…>

In 6d1c8ec79adf566d57d2c35aac8ff6635412d108/ghc:

Persist build-time GMP ver to `HsIntegerGmp.h`

This creates the additional macro definitions in `HsIntegerGmp.h` which
are useful for 3rd party `integer-gmp`-addon libraries.

Here's an example for the definitions created for the in-tree GMP:

  #define GHC_GMP_INTREE     1
  #define GHC_GMP_VERSION_MJ 5
  #define GHC_GMP_VERSION_MI 0
  #define GHC_GMP_VERSION_PL 4
  #define GHC_GMP_VERSION   (5 * 10000 + 0 * 100 + 4)

And here's an example for a system-installed GMP:

  #define GHC_GMP_INTREE     0
  #define GHC_GMP_VERSION_MJ 6
  #define GHC_GMP_VERSION_MI 0
  #define GHC_GMP_VERSION_PL 0
  #define GHC_GMP_VERSION   (6 * 10000 + 0 * 100 + 0)

Part of #9281

Reviewed By: ekmett (via D522)

comment:23 Changed 5 months ago by Herbert Valerio Riedel <hvr@…>

In ed56c023e3e2e3d2a9fe18a17e2131d9a55c69a5/ghc:

Use {bit,popCount}Integer for `Bits Integer`

The primops are implemented in the `integer-gmp2` (#9281) backend and
are already used for the `Bits Natural` instance but aren't used yet for
the `Bits Integer` instace.  This commit fixes that.

comment:24 Changed 5 months ago by Herbert Valerio Riedel <hvr@…>

In 58dcd5c2e2a94643454296ea0bb109db96bd154f/ghc:

Re-implement `testPrimeInteger` predicate (#9281)

This also adds `testPrimeWord#` and `testPrimeBigNat` predicates.

`testPrimeInteger` has been available since `integer-gmp-0.5.1`
(added via f49735486533842cc84df70cafc8d565dffd75db).
The `nextPrimeInteger` function is still missing though.

comment:25 Changed 5 months ago by Herbert Valerio Riedel <hvr@…>

In 8d7831108644584f710515b39b9fc97fbeca7a4c/ghc:

Re-implement `nextPrimeInteger` predicate (#9281)

This also adds `nextPrimeWord#` and `nextPrimeBigNat` predicates.

`nextPrimeInteger` has been available since `integer-gmp-0.5.1`
(added via f49735486533842cc84df70cafc8d565dffd75db).

comment:26 Changed 5 months ago by Herbert Valerio Riedel <hvr@…>

In 2eecf348a62c47abd2f5de5f7eac5f7a3a779107/ghc:

Re-activate `integerGmpInternals` test (#9281)

The `integerGmpInternals` test was disabled in
c774b28f76ee4c220f7c1c9fd81585e0e3af0e8a as many of the primitives
tested in that test weren't available yet w/ `integer-gmp2`.

However, most operations have been reimplemented by now, with the
exception of

    recipModInteger  :: Integer -> Integer -> Integer
    gcdExtInteger    :: Integer -> Integer -> (Integer, Integer)
    powModSecInteger :: Integer -> Integer -> Integer -> Integer
    powModInteger    :: Integer -> Integer -> Integer -> Integer
    powInteger       :: Integer -> Word -> Integer

which are still missing, and will (time permitting) be reimplemented
over time.

comment:27 Changed 5 months ago by Herbert Valerio Riedel <hvr@…>

In d0d4674281a80e4148a82f833948c2b4c3051eab/ghc:

Re-implement `powModInteger`  (#9281)

This also exposes the following type-specialised modular exponentiation
variants of `powModInteger` useful for implementing a `powModNatural`
operation.

  powModBigNat     :: BigNat -> BigNat -> BigNat -> BigNat
  powModBigNatWord :: BigNat -> BigNat -> Word#  -> Word#
  powModWord       :: Word#  -> Word#  -> Word#  -> Word#

`powModInteger` has been available since `integer-gmp-0.5.1`
(added via 4d516855241b70eb687d95e3c121428de885e83e)

comment:28 Changed 5 months ago by Herbert Valerio Riedel <hvr@…>

In 83c48438c800986c537d3cae682d53ee8ed326ed/ghc:

Re-implement `recipModInteger` (#9281)

This also exposes the following two type-specialised modular
exponentiation variants of `recipModInteger` useful for implementing a
`recipModNatural` operation.

  recipModBigNat :: BigNat -> BigNat -> BigNat
  recipModWord   :: Word#  -> Word#  -> Word#

`recipModInteger` has been available since `integer-gmp-0.5.1`
(added via 4d516855241b70eb687d95e3c121428de885e83e)

comment:29 Changed 5 months ago by Herbert Valerio Riedel <hvr@…>

In c0e0ca4d9d5ed47a5e9c88eeab9b538bc76a4eb5/ghc:

Reimplement `gcdExtInteger` (#9281)

`gcdExtInteger` has been available since `integer-gmp-0.5.1`
(added via 71e29584603cff38e7b83d3eb28b248362569d61)

comment:30 Changed 5 months ago by hvr

  • Resolution set to fixed
  • Status changed from patch to closed

Let's consider this accomplished for now.

comment:31 Changed 5 months ago by hvr

comment:32 Changed 4 weeks ago by Herbert Valerio Riedel <hvr@…>

In 995e8c1c8692b60c907c7d2ccea179d52ca8e69e/ghc:

Drop old integer-gmp-0.5 from GHC source tree

This completes what c774b28f76ee4c220f7c1c9fd81585e0e3af0e8a (#9281)
started.  `integer-gmp-1.0` was added as an additional
`libraries/integer-gmp2` folder while retaining the ability to configure
GHC w/ the old `integer-gmp-0.5` to have a way back, and or the ability
to easily switch between old/new `integer-gmp` for benchmark/debugging
purposes.

This commit removes the old `libraries/integer-gmp` folder and moves
`libraries/integer-gmp2` into its place, while removing any mentions of
"gmp2" as well as the to support two different `integer-gmp` packages in
GHC's source-tree.

Reviewed By: austin

Differential Revision: https://phabricator.haskell.org/D769
Note: See TracTickets for help on using tickets.