Changes between Initial Version and Version 1 of MultipleLinkerInstances


Ignore:
Timestamp:
Jul 19, 2009 5:15:46 PM (6 years ago)
Author:
jcpetruzza
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • MultipleLinkerInstances

    v1 v1  
     1= Allow for multiple instances of the GHCi linker = 
     2 
     3This page discusses a plan to fix bug #3372. 
     4 
     5== The problem == 
     6 
     7GHC includes its own linker, to be used by GHCi to resolve symbols. It is currently implemented using global variables for the symbol tables and other internals. This means one cannot have two or more instances of GHC's interpreter running simultaneously on different threads, since their entries on the symbol tables will conflict. The basic idea to solve this is to move all the global variables to a suitable datastructure and associate an instance of it to GHC's state. 
     8 
     9Now, the linker is composed of two rather different parts: the bytecode linker and the object linker, each with its own symbol tables (and, of course, global variables). The latter is part of the RTS, written in C with plenty of #ifdefs to handle a variety of platforms, object file formats, etc. 
     10 
     11Fixing the bytecode linker, as discussed next, seems relatively straightforward. The object linker is much more fragile. In particular, it is harder to test since there is a lot of platform-dependent code under conditional compilation. 
     12 
     13''Question:'' Can we just leave the object linker as it is right now? If I understand correctly, in that case we will run into trouble if, for example, two instances of GHC try to load different .o files with conflicting symbols. If this may happen by attempting to load two incompatible versions of an installed package, then it might be a frequent scenario. 
     14 
     15== Plan for the bytecode linker == 
     16 
     17The relevant code is in [[GhcFile(compiler/ghci/Linker.lhs)]]. The linker's state is kept in the global variable: 
     18 
     19  {{{ 
     20v_PersistentLinkerState :: IORef PersistentLinkerState 
     21  }}} 
     22 
     23There is an additional global variable {{{v_InitLinkerDone :: Bool}}} that is used to make the initialization routine idempotent. This routine is:  
     24 
     25  {{{ 
     26initDynLinker :: DynFlags -> IO () 
     27  }}} 
     28 
     29and is (lazily) called by the exported functions {{{linkExpr}}} and {{{unload}}}. It is also called explicitly from [[GhcFile(ghc/GhciMonad.hs)]]. 
     30 
     31The proposed plan would be to define something along the lines of: 
     32 
     33  {{{ 
     34newtype DynLinker = DynLinker (IORef (Maybe PersistentLinkerState)) 
     35 
     36uninitializedLinker :: IO DynLinker 
     37uninitializedLinker = DynLinker `fmap` newIORef Nothing 
     38 
     39initDynLinker :: DynFlags -> DynLinker -> IO () 
     40initDynLinker dflags DynLinker r = 
     41    = do s <- readIORef r 
     42         when (isNothing s) $ 
     43          reallyInitDynLinker dflags r 
     44 
     45 
     46withLinkerState :: (MonadIO m, ExceptionMonad m) => DynLinker -> (IORef PersistentLinkerState -> m a) -> m a 
     47withLinkerState (DynLinker r) action 
     48    = do maybe_s <- readIORef r 
     49         case maybe_s of 
     50           Nothing -> panic "Dynamic linker not initialised" 
     51           Just s  -> do r' <- liftIO $ newIORef s 
     52                         action r' 
     53                         liftIO $ writeIORef r =<< readIORef r' 
     54  }}} 
     55 
     56This way we keep the lazy initialization and minimize the modifications needed on the rest of the functions. For example we would turn the following exported function: 
     57 
     58  {{{ 
     59extendLinkEnv :: [(Name,HValue)] -> IO () 
     60-- Automatically discards shadowed bindings 
     61extendLinkEnv new_bindings 
     62  = do  pls <- readIORef v_PersistentLinkerState 
     63        let new_closure_env = extendClosureEnv (closure_env pls) new_bindings 
     64            new_pls = pls { closure_env = new_closure_env } 
     65        writeIORef v_PersistentLinkerState new_pls 
     66  }}} 
     67 
     68into this version: 
     69 
     70  {{{ 
     71extendLinkEnv :: DynLinker -> [(Name,HValue)] -> IO () 
     72-- Automatically discards shadowed bindings 
     73extendLinkEnv dl new_bindings 
     74  = withLinkerState $ \v_PersistentLinkerState ->  
     75    do  pls <- readIORef v_PersistentLinkerState 
     76        let new_closure_env = extendClosureEnv (closure_env pls) new_bindings 
     77            new_pls = pls { closure_env = new_closure_env } 
     78        writeIORef v_PersistentLinkerState new_pls 
     79  }}} 
     80 
     81''Question:'' Would it be better to use an {{{MVar}}} instead of an {{{IORef}}} in {{{DynLinker}}}? 
     82 
     83Finally, to make the {{{DynLinker}}} available everywhere, we would have to add a field in {{{HscEnv}}} ([[GhcFile(compiler/main/HscTypes.lhs)]]): 
     84 
     85  {{{ 
     86data HscEnv  
     87  = HscEnv {  
     88     ... 
     89#ifdef GHCI 
     90        hsc_dynLinker :: DynLinker, 
     91#endif   
     92     ... 
     93    } 
     94  }}} 
     95 
     96== Plan for the object linker == 
     97 
     98The object linker ([[GhcFile(rts/Linker.c)]]) is responsible of loading and keeping track of symbols in object files and shared libraries. For object files it basically uses three global variables: 
     99 
     100  {{{ 
     101/* Hash table mapping symbol names to Symbol */ 
     102static /*Str*/HashTable *symhash; 
     103 
     104/* Hash table mapping symbol names to StgStablePtr */ 
     105static /*Str*/HashTable *stablehash; 
     106 
     107/* List of currently loaded objects */ 
     108ObjectCode *objects = NULL;     /* initially empty */ 
     109  }}} 
     110 
     111Each time an object file is loaded, a new {{{ObjectCode}}} node is added to the {{{objects}}} linked list and {{{symhash}}} is populated with a pointer for each symbol. 
     112 
     113''Question:'' What is {{{stablehash}}} used for?  
     114 
     115For shared libraries the code varies with each platform. On Windows a linked list of handles to opened DLLs is stored in a global variable: 
     116 
     117  {{{ 
     118typedef 
     119   struct _OpenedDLL { 
     120      char*              name; 
     121      struct _OpenedDLL* next; 
     122      HINSTANCE instance; 
     123   } 
     124   OpenedDLL; 
     125 
     126/* A list thereof. */ 
     127static OpenedDLL* opened_dlls = NULL; 
     128  }}} 
     129 
     130To lookup a symbol one has to iterate {{{opened_dlls}}} and for each handle, lookup the symbol there. 
     131 
     132For the ELF and Mach-O case, libraries are dlopen'd using RTLD_GLOBAL and later accessed using the program's dl-handle. This is stored in: 
     133 
     134  {{{ 
     135static void *dl_prog_handle; 
     136  }}} 
     137 
     138A possible solution would be to put all these variables in a datastructure: 
     139 
     140{{{ 
     141typedef struct _ObjLinkerState { 
     142  /* Hash table mapping symbol names to Symbol */ 
     143  /*Str*/HashTable *symhash; 
     144 
     145  /* Hash table mapping symbol names to StgStablePtr */ 
     146  /*Str*/HashTable *stablehash; 
     147 
     148  /* List of currently loaded objects */ 
     149  ObjectCode *objects = NULL;   /* initially empty */ 
     150 
     151#if defined(OBJFORMAT_PEi386) 
     152  OpenedDLL* opened_dlls = NULL; 
     153#endif 
     154 
     155#if defined(OBJFORMAT_ELF) || defined(OBJFORMAT_MACHO) 
     156  void *dl_prog_handle; 
     157#endif 
     158} ObjLinkerState; 
     159}}} 
     160 
     161and add to {{{PersistentLinkerState}}} a {{{ForeignPtr}}} to a malloc'd {{{ObjLinkerState}}}. 
     162 
     163''Question:'' Will this work in the case of ELF shared libraries if two instances of GHC load two different (conflicting) versions of a .so? My impression is that it won't and that the workaround would be to use a linked list of handles like is done with DLLs. 
     164 
     165''Question:'' There are other platform-specific global variables defined in [[GhcFile(rts/Linker.c)]] that I don't know how should be handled: 
     166  * This one seems to be a constant that may be overridden during initialization: 
     167  {{{ 
     168static void *mmap_32bit_base = (void *)MMAP_32BIT_BASE_DEFAULT 
     169  }}} 
     170 I guess it can continue being a global variable. 
     171  * No idea about these ones: 
     172  {{{ 
     173static Elf_Addr got[GOT_SIZE]; 
     174static unsigned int gotIndex; 
     175static Elf_Addr gp_val = (Elf_Addr)got; 
     176  }}} 
     177  * No idea about these ones either: 
     178  {{{ 
     179static FunctionDesc functionTable[FUNCTION_TABLE_SIZE]; 
     180static unsigned int functionTableIndex; 
     181  }}}