|Version 20 (modified by ptanski, 9 years ago) (diff)|
Replacing GMP: Bignum libraries, Licensing and Implementation
GHC currently implements the Integer and Fractional types by using the The GNU MP Bignum Library (GMP) which supports arbitrary precision mathematical calculations. GMP is fast, memory efficient, and offers many high level signed integer functions (140 of them), as well as many rational and floating point arithmetic functions. The current GHC implementation only uses those functions necessary for the Prelude.
GMP memory is integrated with the RunTime System's (RTS's) Garbage Collector (GC). GMP memory is allocated from the GC heap, so values produced by GMP are under the control of the RTS and its GC. The current implementation is memory efficient wile allowing the RTS and its GC to maintain control of GMP evaluations.
If you want to help with replacing GMP or do it yourself, you will have to work with the GC and RTS system. The parts you will have to modify are written in C and C--, with configuration and assembly done through the Makefiles. You should have an understanding of:
- how the GC works and how memory from GMP is integrated with it;
- some C-- (this is fairly basic if you know C well, the only real documentation on C-- itself is in the C-- manual (PDF), from cminusminus.org; the implementation of C-- for GHC is performed by several Haskell modules in the directory compiler/cmm of the HEAD branch); and,
- makefiles and configuration scripts.
Other basic recommended reading is:
- The GHC Commentary: The Native Code Generator; and,
- The GHC Commentary: Style Guidelines for RTS C code.
Beware! The main interest here is replacing GMP--GHC is still belongs to the University of Glasgow and those in charge still retain the purview to accept or reject a proposed solution.
Reasons for Replacing GMP as the Bignum library
There are several problems with the current GMP implementation:
GMP is licensed under the GNU Lesser General Public License (LGPL), a kind of "copyleft" license. According to the terms of the LGPL, paragraph 5, you may distribute a program that is designed to be compiled and dynamically linked with the library under the terms of your choice (i.e., commercially) but if your program incorporates portions of the library, if it is linked statically, then your program is a "derivative"--a "work based on the library"--and according to paragraph 2, section c, you "must cause the whole of the work to be licensed" under the terms of the LGPL (including for free).
The LGPL licensing for GMP is a problem for the overall licensing of binary programs compiled with GHC because most distributions (and builds) of GHC use static libraries. (Dynamic libraries are currently distributed only for OS X.) The LGPL licensing situation may be worse: even though The Glasgow Haskell Compiler License is essentially a "free software" license (BSD3), according to paragraph 2 of the LGPL, GHC must be distributed under the terms of the LGPL!
- Memory Structure; Simultaneous Access to GMP by Foreign (C) code in the Same Binary
In the current GMP implementation, GMP is configured to use GHC's GC memory and GMP can only have one allocator for memory. Since any single binary containing Haskell code compiled with GHC contains the RTS and GMP, C code in the same binary cannot use GMP. This problem was noted in bug Ticket #311. The Simon Peyton-Jones suggested that a simple renaming of GHC-GMP functions would solve this problem and Bulat Ziganshin suggested simply using an automated tool to do this. See Replacement for GMP. Different function names would make GMP into a separate, custom GHC library leaving the C part of the program free to use GMP.
GHC does not have a custom-modified version of GMP (in fact, GHC uses the system build of GMP if that is available). The memory configuration of GMP uses GMP's Custom Allocation routines. Alternative libraries may not have this facility built in.
- Other Improvements to Integer
Most of the suggestions in this section come from discussions in the glasgow-haskell-users list thread returning to Cost of Integer. In particular, John Meacham's suggestion to use a ForeignPtr to data held by the normal GMP system library and store the value in an unboxed Int if the number of significant digits in Integer could fit into the size of an Int. For those who are curious, a guide to GHC primitives is available (in an unformatted version) in ghc/compiler/prelude/primops.txt.pp; here is a link to CVS version of primops.txt.pp. (See The GHC Commentary Primitives for a description of primops.txt.pp; and a highly recommended introduction directly related to GMP is AddingNewPrimitiveOperations.) In primops.txt.pp, you might want to search for the text "section "The word size story."", and especially the text "section "Integer#"". The Haskell definition of Integer is in /packages/base/GHC/Num.lhs.
The current GMP implementation of Integer is:data Integer = S# Int# -- small integers #ifndef ILX | J# Int# ByteArray# -- large integers #else | J# Void BigInteger -- .NET big ints
where the Int# counts the number of limbs (a GMP term referring to parts of a multi-precision number that fit into a 32 or 64 bit word, depending on the machine) and the ByteArr# is the actual array in RTS-GC memory holding the limbs. The sign of the Int# is used to indicate the sign of the number represented by the ByteArr#.
This current implementation of Integer means that there are two separate constructors for small and large Integers (S# Int# and J# Int# ByteArr#). The suggestion discussed by John Meacham, Lennart Augustsson, Simon Marlow and Bulat Ziganshin was to change the representation of Integer so the Int# does the work of S# and J#: the Int# could be either a pointer to the Bignum library array of limbs or, if the number of significant digits could fit into say, 31 bits, to use the extra bit as an indicator of that fact and hold the entire value in the Int#, thereby saving the memory from S# and J#.
Bulat Ziganshin and John Meacham noted a few problems with a 30bit Int:
- interoperability between Haskell and other languages, especially C, would be more difficult so you would have to define a new primitive, say #Int30 for the representation; and,
- representing a Haskell constructor (the Int#) inside a pointer--a bit-size constructor--would limit the number of constructors you would be able to have (depending on the size of a pointer object, say the C99 uintptr_t, on a particular machine).
Overview of the Current GMP Implementation
Esa Ilari Vuokko, who at one time attempted to replace GMP with LibTomMath, posted several messages with good notes on the current implementation. Much of what is on this page is derived from those notes. See, Replacement for GMP(3) and Replacement for GMP(4).