Changes between Initial Version and Version 1 of MultipleLinkerInstances


Ignore:
Timestamp:
Jul 19, 2009 5:15:46 PM (6 years ago)
Author:
jcpetruzza
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • MultipleLinkerInstances

    v1 v1  
     1= Allow for multiple instances of the GHCi linker =
     2
     3This page discusses a plan to fix bug #3372.
     4
     5== The problem ==
     6
     7GHC includes its own linker, to be used by GHCi to resolve symbols. It is currently implemented using global variables for the symbol tables and other internals. This means one cannot have two or more instances of GHC's interpreter running simultaneously on different threads, since their entries on the symbol tables will conflict. The basic idea to solve this is to move all the global variables to a suitable datastructure and associate an instance of it to GHC's state.
     8
     9Now, the linker is composed of two rather different parts: the bytecode linker and the object linker, each with its own symbol tables (and, of course, global variables). The latter is part of the RTS, written in C with plenty of #ifdefs to handle a variety of platforms, object file formats, etc.
     10
     11Fixing the bytecode linker, as discussed next, seems relatively straightforward. The object linker is much more fragile. In particular, it is harder to test since there is a lot of platform-dependent code under conditional compilation.
     12
     13''Question:'' Can we just leave the object linker as it is right now? If I understand correctly, in that case we will run into trouble if, for example, two instances of GHC try to load different .o files with conflicting symbols. If this may happen by attempting to load two incompatible versions of an installed package, then it might be a frequent scenario.
     14
     15== Plan for the bytecode linker ==
     16
     17The relevant code is in [[GhcFile(compiler/ghci/Linker.lhs)]]. The linker's state is kept in the global variable:
     18
     19  {{{
     20v_PersistentLinkerState :: IORef PersistentLinkerState
     21  }}}
     22
     23There is an additional global variable {{{v_InitLinkerDone :: Bool}}} that is used to make the initialization routine idempotent. This routine is:
     24
     25  {{{
     26initDynLinker :: DynFlags -> IO ()
     27  }}}
     28
     29and is (lazily) called by the exported functions {{{linkExpr}}} and {{{unload}}}. It is also called explicitly from [[GhcFile(ghc/GhciMonad.hs)]].
     30
     31The proposed plan would be to define something along the lines of:
     32
     33  {{{
     34newtype DynLinker = DynLinker (IORef (Maybe PersistentLinkerState))
     35
     36uninitializedLinker :: IO DynLinker
     37uninitializedLinker = DynLinker `fmap` newIORef Nothing
     38
     39initDynLinker :: DynFlags -> DynLinker -> IO ()
     40initDynLinker dflags DynLinker r =
     41    = do s <- readIORef r
     42         when (isNothing s) $
     43          reallyInitDynLinker dflags r
     44
     45
     46withLinkerState :: (MonadIO m, ExceptionMonad m) => DynLinker -> (IORef PersistentLinkerState -> m a) -> m a
     47withLinkerState (DynLinker r) action
     48    = do maybe_s <- readIORef r
     49         case maybe_s of
     50           Nothing -> panic "Dynamic linker not initialised"
     51           Just s  -> do r' <- liftIO $ newIORef s
     52                         action r'
     53                         liftIO $ writeIORef r =<< readIORef r'
     54  }}}
     55
     56This way we keep the lazy initialization and minimize the modifications needed on the rest of the functions. For example we would turn the following exported function:
     57
     58  {{{
     59extendLinkEnv :: [(Name,HValue)] -> IO ()
     60-- Automatically discards shadowed bindings
     61extendLinkEnv new_bindings
     62  = do  pls <- readIORef v_PersistentLinkerState
     63        let new_closure_env = extendClosureEnv (closure_env pls) new_bindings
     64            new_pls = pls { closure_env = new_closure_env }
     65        writeIORef v_PersistentLinkerState new_pls
     66  }}}
     67
     68into this version:
     69
     70  {{{
     71extendLinkEnv :: DynLinker -> [(Name,HValue)] -> IO ()
     72-- Automatically discards shadowed bindings
     73extendLinkEnv dl new_bindings
     74  = withLinkerState $ \v_PersistentLinkerState ->
     75    do  pls <- readIORef v_PersistentLinkerState
     76        let new_closure_env = extendClosureEnv (closure_env pls) new_bindings
     77            new_pls = pls { closure_env = new_closure_env }
     78        writeIORef v_PersistentLinkerState new_pls
     79  }}}
     80
     81''Question:'' Would it be better to use an {{{MVar}}} instead of an {{{IORef}}} in {{{DynLinker}}}?
     82
     83Finally, to make the {{{DynLinker}}} available everywhere, we would have to add a field in {{{HscEnv}}} ([[GhcFile(compiler/main/HscTypes.lhs)]]):
     84
     85  {{{
     86data HscEnv
     87  = HscEnv {
     88     ...
     89#ifdef GHCI
     90        hsc_dynLinker :: DynLinker,
     91#endif 
     92     ...
     93    }
     94  }}}
     95
     96== Plan for the object linker ==
     97
     98The object linker ([[GhcFile(rts/Linker.c)]]) is responsible of loading and keeping track of symbols in object files and shared libraries. For object files it basically uses three global variables:
     99
     100  {{{
     101/* Hash table mapping symbol names to Symbol */
     102static /*Str*/HashTable *symhash;
     103
     104/* Hash table mapping symbol names to StgStablePtr */
     105static /*Str*/HashTable *stablehash;
     106
     107/* List of currently loaded objects */
     108ObjectCode *objects = NULL;     /* initially empty */
     109  }}}
     110
     111Each time an object file is loaded, a new {{{ObjectCode}}} node is added to the {{{objects}}} linked list and {{{symhash}}} is populated with a pointer for each symbol.
     112
     113''Question:'' What is {{{stablehash}}} used for?
     114
     115For shared libraries the code varies with each platform. On Windows a linked list of handles to opened DLLs is stored in a global variable:
     116
     117  {{{
     118typedef
     119   struct _OpenedDLL {
     120      char*              name;
     121      struct _OpenedDLL* next;
     122      HINSTANCE instance;
     123   }
     124   OpenedDLL;
     125
     126/* A list thereof. */
     127static OpenedDLL* opened_dlls = NULL;
     128  }}}
     129
     130To lookup a symbol one has to iterate {{{opened_dlls}}} and for each handle, lookup the symbol there.
     131
     132For the ELF and Mach-O case, libraries are dlopen'd using RTLD_GLOBAL and later accessed using the program's dl-handle. This is stored in:
     133
     134  {{{
     135static void *dl_prog_handle;
     136  }}}
     137
     138A possible solution would be to put all these variables in a datastructure:
     139
     140{{{
     141typedef struct _ObjLinkerState {
     142  /* Hash table mapping symbol names to Symbol */
     143  /*Str*/HashTable *symhash;
     144
     145  /* Hash table mapping symbol names to StgStablePtr */
     146  /*Str*/HashTable *stablehash;
     147
     148  /* List of currently loaded objects */
     149  ObjectCode *objects = NULL;   /* initially empty */
     150
     151#if defined(OBJFORMAT_PEi386)
     152  OpenedDLL* opened_dlls = NULL;
     153#endif
     154
     155#if defined(OBJFORMAT_ELF) || defined(OBJFORMAT_MACHO)
     156  void *dl_prog_handle;
     157#endif
     158} ObjLinkerState;
     159}}}
     160
     161and add to {{{PersistentLinkerState}}} a {{{ForeignPtr}}} to a malloc'd {{{ObjLinkerState}}}.
     162
     163''Question:'' Will this work in the case of ELF shared libraries if two instances of GHC load two different (conflicting) versions of a .so? My impression is that it won't and that the workaround would be to use a linked list of handles like is done with DLLs.
     164
     165''Question:'' There are other platform-specific global variables defined in [[GhcFile(rts/Linker.c)]] that I don't know how should be handled:
     166  * This one seems to be a constant that may be overridden during initialization:
     167  {{{
     168static void *mmap_32bit_base = (void *)MMAP_32BIT_BASE_DEFAULT
     169  }}}
     170 I guess it can continue being a global variable.
     171  * No idea about these ones:
     172  {{{
     173static Elf_Addr got[GOT_SIZE];
     174static unsigned int gotIndex;
     175static Elf_Addr gp_val = (Elf_Addr)got;
     176  }}}
     177  * No idea about these ones either:
     178  {{{
     179static FunctionDesc functionTable[FUNCTION_TABLE_SIZE];
     180static unsigned int functionTableIndex;
     181  }}}