wiki:Commentary/Libraries/Integer

Version 8 (modified by simonpj, 22 months ago) (diff)

--

GHC Commentary: Libraries/Integer

GHC is set up to allow different implementations of the Integer type to be chosen at build time.

Selecting an Integer implementation

You can select which implementation of Integer is used by defining INTEGER_LIBRARY in mk/build.mk. This tells the build system to build the library in libraries/$(INTEGER_LIBRARY), and the cIntegerLibrary and cIntegerLibraryType values in Config.hs are defined accordingly.

The default value is integer-gmp, which uses the GNU Multiple Precision Arithmetic Library (GMP) to define the Integer type and its operations.

The other implementation currently available is integer-simple, which uses a simple (but slow, for larger Integers) pure Haskell implementation.

The Integer interface

All Integer implementations should export the same set of types and functions from GHC.Integer (within whatever integer package you are using). These exports are used by the base package However, all of these types and functions must actually be defined in GHC.Integer.Type, so that GHC knows where to find them.

How Integer is handled inside GHC

  • Front end. Integers are represented using the HsInteger constructor of HsLit for the early phases of compilation (e.g. type checking)
  • Core. In Core representation, an integer literal is represented by the LitInteger constructor of the Literal type.
    data Literal = ... | LitInteger Integer Type
    
    While Integers aren't "machine literals" like the other Core Literal constructors, it is more convenient when writing constant folding RULES to pretend that they are literals rather than having to understand their concrete representation. (Especially as the concrete representation varies from package to package.) We also carry around a Type, representing the Integer type, in the constructor, as we need access to it in a few functions (e.g. literalType).
  • Constant folding. There are many constant-folding optimisations for Integer expressed as built-in rules in compiler/prelude/PrelRules.lhs; look at builtinIntegerRules. All of the types and functions in the Integer interface have built-in names, e.g. plusIntegerName, defined in compiler/prelude/PrelNames.lhs and included in basicKnownKeyNames. This allows us to match on all of the functions in builtinIntegerRules in compiler/prelude/PrelRules.lhs, so we can constant-fold Integer expressions.
  • Representing integers. We stick to the LitInteger representation (which hides the concrete representation) as late as possible in the compiler. In particular, it's important that this representation is used in unfoldings in interface files, so that constant folding can happen on expressions that get inlined. We finally convert LitInteger to a proper core representation of Integer in compiler/coreSyn/CorePrep.lhs, which looks up the Id for mkInteger and uses it to build an expression like mkInteger True [123, 456] (where the Bool represents the sign, and the list of Ints are 31 bit chunks of the absolute value from lowest to highest).

However, there is a special case for Integers that are within the range of Int when the integer-gmp implementation is being used; in that case, we use the S# constructor (via integerGmpSDataCon in compiler/prelude/TysWiredIn.lhs) to break the abstraction and directly create the datastructure.

  • Don't inline integer functions. Most of the functions in the Integer implementation in the integer package are marked NOINLINE. For example in integer-gmp we have
    plusInteger :: Integer -> Integer -> Integer
    plusInteger (S# i1) (S# i2) = ...
    plusInteger (S# i1) (J# j1 j2) = ...
    ...two more cases...
    
    Not only is this a big function to inline, but inlining it typically does no good because the representation of literals is abstact, so no pattern-matching cancellation happens. And even if you have (a+b+c), the conditionals mean that no cancellation happens, or you get an exponential code explosion!