Changes between Version 27 and Version 28 of ReplacingGMPNotes


Ignore:
Timestamp:
Sep 20, 2006 6:55:08 PM (9 years ago)
Author:
guest
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ReplacingGMPNotes

    v27 v28  
    7575 * [http://darcs.haskell.org/ghc/includes/Cmm.h includes/Cmm.h] (''Modify'': cpp test for {{{#if SIZEOF_mp_limb_t != SIZEOF_VOID_P }}})
    7676 * [http://darcs.haskell.org/ghc/includes/MachRegs.h includes/MachRegs.h] (''Reference'': general; unrelated to GMP: may be starting point for vectorized Cmm (currently only -fvia-c allows auto-vectorization))
    77  * [http://darcs.haskell.org/ghc/includes/Regs.h includes/Regs.h] (''Modify'': references to MP_INT; Reference: Stg registers, etc.)
     77 * [http://darcs.haskell.org/ghc/includes/mkDerivedConstants.c includes/mkDerivedConstants.c] (''Modify'': references to GMP {{{__mpz_struct}}}: {{{struct_size(MP_INT)}}}, {{{struct_field(MP_INT,_mp_alloc)}}}, {{{struct_field(MP_INT,_mp_size)}}}, {{{struct_field(MP_INT,_mp_d)}}} and {{{ctype(mp_limb_t)}}}.  Note: mp_limb_t generally == unsigned long)
     78 * [http://darcs.haskell.org/ghc/includes/Regs.h includes/Regs.h] (''Modify'': references to MP_INT, {{{#include "gmp.h"}}}; Reference: Stg registers, etc.)
    7879 * [http://darcs.haskell.org/ghc/includes/Rts.h includes/Rts.h] (''Modify'': reference to {{{#include "gmp.h"}}}, {{{extern}}} declarations to {{{__decodeDouble}}} and {{{__decodeFloat}}}; References to various Stg types and macros)
    7980 * [http://darcs.haskell.org/ghc/includes/StgMiscClosures.h includes/StgMiscClosures.h] (''Modify'': references to {{{RTS_FUN(...Integer)}}} !PrimOps; ''Reference'': Weak Pointers, other Stg closures)
     
    8182 * [http://darcs.haskell.org/ghc/rts/Linker.c rts/Linker.c] (''Modify'': {{{SymX(__gmpn...)}}} and related GMP functions)
    8283 * [http://darcs.haskell.org/ghc/rts/Makefile rts/Makefile] (''Modify'': building GMP library)
    83  * [http://darcs.haskell.org/ghc/rts/PrimOps.cmm PrimOps.cmm] (''Modify'': remove GMP references; NOTE: optimisation of {{{/* ToDo: this is shockingly inefficient */}}}, see discussion below)
    84  * [http://darcs.haskell.org/ghc/rts/StgPrimFloat.c StgPrimFloat.c] (''Modify'': {{{__encodeDouble}}}, {{{__encodeFloat}}} and {{{decode}}} versions defined here refer to GMP; might optimise with bitwise conversion instead of union; conversion depends on whether replacement MP library uses floating point, etc.)
     84 * [http://darcs.haskell.org/ghc/rts/PrimOps.cmm rts/PrimOps.cmm] (''Modify'': remove GMP references; NOTE: optimisation of {{{/* ToDo: this is shockingly inefficient */}}}, see discussion below)
     85 * [http://darcs.haskell.org/ghc/rts/StgPrimFloat.c rts/StgPrimFloat.c] (''Modify'': {{{__encodeDouble}}}, {{{__encodeFloat}}} and {{{decode}}} versions defined here refer to GMP; might optimise with bitwise conversion instead of union; conversion depends on whether replacement MP library uses floating point, etc.)
    8586 * [http://darcs.haskell.org/ghc/rts/Storage.c rts/Storage.c] (''Modify'': {{{stgAllocForGMP}}}, {{{stgReallocForGMP}}} and {{{stgDeallocForGMP}}}; may use as reference for implementation if replacement MP library uses GHC-garbage collected memory)
    8687 * [http://darcs.haskell.org/ghc/rts/gmp/ rts/gmp (directory)] (''Modify'': recommended to remove entirely, i.e., do not add conditional compilation for users who want to keep on using GMP)
     
    8889==== Optimisation Opportunities ====
    8990
     91 (1) Initialisation of GMP on every call. 
     92GMP is currently initialised every time it is called from the RTS.  For example, consider the macro in [http://darcs.haskell.org/ghc/rts/PrimOps.cmm rts/PrimOps.cmm]:
     93{{{
     94#define GMP_TAKE1_RET1(name,mp_fun)                                     \
     95name                                                                    \
     96{                                                                       \
     97  CInt s1;                                                              \
     98  W_ d1;                                                                \
     99  FETCH_MP_TEMP(mp_tmp1);                                               \
     100  FETCH_MP_TEMP(mp_result1)                                             \
     101                                                                        \
     102  /* call doYouWantToGC() */                                            \
     103  MAYBE_GC(R2_PTR, name);                                               \
     104                                                                        \
     105  d1 = R2;                                                              \
     106  s1 = W_TO_INT(R1);                                                    \
     107                                                                        \
     108  MP_INT__mp_alloc(mp_tmp1)     = W_TO_INT(StgArrWords_words(d1));      \
     109  MP_INT__mp_size(mp_tmp1)      = (s1);                                 \
     110  MP_INT__mp_d(mp_tmp1)         = BYTE_ARR_CTS(d1);                     \
     111                                                                        \
     112  foreign "C" __gmpz_init(mp_result1 "ptr") [];                         \ /* INITIALISATION HERE */
     113                                                                        \
     114  /* Perform the operation */                                           \
     115  foreign "C" mp_fun(mp_result1 "ptr",mp_tmp1 "ptr") [];                \
     116                                                                        \
     117  RET_NP(TO_W_(MP_INT__mp_size(mp_result1)),                            \
     118         MP_INT__mp_d(mp_result1) - SIZEOF_StgArrWords);                \
     119}
     120}}}
     121Possible solutions:
     122   (a) wrap the MP library functions (in C, not C--) and test for initialisation there; wrapper should be multi-threaded or reentrant
    90123
     124   (b) maintain initialisation in GHC's RTS
     125
     126(2) The "shockingly inefficient" operation of this code:
     127{{{
     128/* ToDo: this is shockingly inefficient */
     129
     130#ifndef THREADED_RTS
     131section "bss" {
     132  mp_tmp1:
     133    bits8 [SIZEOF_MP_INT];
     134}
     135
     136section "bss" {
     137  mp_tmp2:
     138    bits8 [SIZEOF_MP_INT];
     139}
     140
     141section "bss" {
     142  mp_result1:
     143    bits8 [SIZEOF_MP_INT];
     144}
     145
     146section "bss" {
     147  mp_result2:
     148    bits8 [SIZEOF_MP_INT];
     149}
     150#endif
     151}}}
     152should be obvious.  There are at least two possible alternatives to this:
     153   (a) wrap the replacement MP-library array/structure for arbitrary precision integers in a closure so you do not have to rebuild the struct from on each MP-library call; or
     154
     155   (b) use !ForeignPtr (in Cmm, Weak Pointers--difficult to implement) to foreign threads holding the the struct/array
    91156
    92157=== Benchmarks for Multi-Precision Libraries ===