|Version 9 (modified by simonpj, 3 years ago) (diff)|
GHC Commentary: Libraries/Integer
GHC is set up to allow different implementations of the Integer type to be chosen at build time.
Selecting an Integer implementation
You can select which implementation of Integer is used by defining INTEGER_LIBRARY in mk/build.mk. This tells the build system to build the library in libraries/$(INTEGER_LIBRARY), and the cIntegerLibrary and cIntegerLibraryType values in Config.hs are defined accordingly.
The default value is integer-gmp, which uses the GNU Multiple Precision Arithmetic Library (GMP) to define the Integer type and its operations.
The other implementation currently available is integer-simple, which uses a simple (but slow, for larger Integers) pure Haskell implementation.
The Integer interface
All Integer implementations should export the same set of types and functions from GHC.Integer (within whatever integer package you are using). These exports are used by the base package However, all of these types and functions must actually be defined in GHC.Integer.Type, so that GHC knows where to find them.
How Integer is handled inside GHC
- Front end. Integers are represented using the HsInteger constructor of HsLit for the early phases of compilation (e.g. type checking)
- Core. In Core representation, an integer literal is represented by the LitInteger constructor of the Literal type.
data Literal = ... | LitInteger Integer TypeWhile Integers aren't "machine literals" like the other Core Literal constructors, it is more convenient when writing constant folding RULES to pretend that they are literals rather than having to understand their concrete representation. (Especially as the concrete representation varies from package to package.) We also carry around a Type, representing the Integer type, in the constructor, as we need access to it in a few functions (e.g. literalType).
- Constant folding. There are many constant-folding optimisations for Integer expressed as built-in rules in compiler/prelude/PrelRules.lhs; look at builtinIntegerRules. All of the types and functions in the Integer interface have built-in names, e.g. plusIntegerName, defined in compiler/prelude/PrelNames.lhs and included in basicKnownKeyNames. This allows us to match on all of the functions in builtinIntegerRules in compiler/prelude/PrelRules.lhs, so we can constant-fold Integer expressions.
- Converting between Int and Integer. It's quite commonly the case that, after some inlining, we get something like integerToInt (intToInteger i), which converts an Int to an Integer and back. This must optimise away. We do this by requiring that the integer package exposes
smallInteger :: Int# -> IntNow we can define intToInteger (or, more precisely, the toInteger method of the Integral Int instance in GHC.Real ) thus
toInteger (I# i) = smallInteger iAnd we have a RULE for integerToInt (smallInteger i).
- Representing integers. We stick to the LitInteger representation (which hides the concrete representation) as late as possible in the compiler. In particular, it's important that the LitInteger representation is used in unfoldings in interface files, so that constant folding can happen on expressions that get inlined.
We finally convert LitInteger to a proper core representation of Integer in compiler/coreSyn/CorePrep.lhs, which looks up the Id for mkInteger and uses it to build an expression like mkInteger True [123, 456] (where the Bool represents the sign, and the list of Ints are 31 bit chunks of the absolute value from lowest to highest).
However, there is a special case for Integers that are within the range of Int when the integer-gmp implementation is being used; in that case, we use the S# constructor (via integerGmpSDataCon in compiler/prelude/TysWiredIn.lhs) to break the abstraction and directly create the datastructure.
- Don't inline integer functions. Most of the functions in the Integer implementation in the integer package are marked NOINLINE. For example in integer-gmp we have
plusInteger :: Integer -> Integer -> Integer plusInteger (S# i1) (S# i2) = ... plusInteger (S# i1) (J# j1 j2) = ... ...two more cases...Not only is this a big function to inline, but inlining it typically does no good because the representation of literals is abstact, so no pattern-matching cancellation happens. And even if you have (a+b+c), the conditionals mean that no cancellation happens, or you get an exponential code explosion!