Changes between Version 14 and Version 15 of SharedLibraries


Ignore:
Timestamp:
May 14, 2009 12:33:40 PM (6 years ago)
Author:
duncan
Comment:

Start on intro page to shared libs, with detail moved to sub-pages

Legend:

Unmodified
Added
Removed
Modified
  • SharedLibraries

    v14 v15  
    22[[PageOutline]] 
    33 
    4 = Shared Libraries: distribution and build-system issues = 
     4= Shared Libraries = 
    55 
    6 This page is for discussing and documenting our strategy for 
     6This page provides an introduction to shared libraries in general and specifically how they are supported and implemented in GHC. 
    77 
    8  * How shared libraries are found 
    9  * How the build system works 
    10  * How distributions (of GHC and programs built by GHC) work 
    11  * Issues that affect Cabal 
     8More detailed topics: 
    129 
    13 == Goals/Scenarios == 
     10 * SharedLibraries/Management: how we organise and manage shared libs 
     11 * SharedLibraries/PlatformSupport: status of shared lib support on various platforms 
     12 * [wiki:Commentary/PositionIndependentCode]: how `ghc -fPIC` works 
    1413 
    15 First of all, we take it as a given that a normal GHC installation will be a good citizen on its host platform: shared libraries will go in the standard locations, and we'll use the system's normal method for finding them at link time and runtime.  Windows is an exception: there is no standard location for installing shared libraries on Windows. 
     14== What shared libs are == 
    1615 
    17 So that we can support having multiple versions of GHC installed, shared libraries will have the GHC version number embedded, e.g. `libHSnetwork-1.1-ghc6.6.1.so`. 
     16[http://en.wikipedia.org/wiki/Shared_libraries Shared libraries] (sometimes called dynamic libraries) are an alternative way of organising pre-compiled code compared to traditional static libraries. The key difference is that with shared libs, linking programs against library functions takes place when the program is run rather than when the program is built and installed. 
    1817 
    19 Here is what else we'd like to do: 
     18All modern operating systems use shared libs. For system libraries they have a particular advantage. They allow the library to be upgraded separately from the programs that use the libs. However this requires preserving an ABI. They also allow a single copy of code to be shared in memory between several programs that use it. For common system libraries this can be a significant saving. 
    2019 
    21  1. Support installing GHC outside of the standard location (e.g. in a home directory), and build 
    22     binaries using that installation.  Multiple such installations should be supported. 
    23  2. We need to build a distribution that supports choosing the install location at install time, for 
    24     use in (1). 
    25  3. Binaries that are built as part of the GHC build (e.g. stage2/ghc-inplace) need to run from 
    26     the build tree. 
    27  4. Cabal needs to build libraries that can be installed in the system location or elsewhere. 
     20== The three major shared libs systems == 
    2821 
    29 = Proposed strategies = 
     22There are three systems in common use: 
    3023 
    31 == 1. Static linking == 
     24 * '''ELF''' ([http://en.wikipedia.org/wiki/Executable_and_Linkable_Format Executable and Linkable Format]) is used on all modern Unix systems (except MacOS X), in particular it is used on Linux, Solaris and the BSDs. 
     25 * '''PE''' ([http://en.wikipedia.org/wiki/Portable_Executable Portable Executable]) format is used on Windows. 
     26 * '''Mach-O''' ([http://en.wikipedia.org/wiki/Mach-O Mach object]) is the format used on Mac OS X. 
    3227 
    33 (1,2) Installations of GHC that are not in the standard locations use static linking and come with static libraries only. 
     28On each system, the same format is used for executables, shared libraries and intermediate object files. Each system uses their own file extension for shared libraries: 
     29|| System || executable extension || shared library extension || 
     30|| ELF    || (no extension)       || `.so`     || 
     31|| PE     || `.exe`               || `.dll`    || 
     32|| Mach-O || (no extension)       || `.dylib`  || 
    3433 
    35 (3) stage2/ghc-inplace is linked statically. 
     34Unfortunately, while static linking is relatively uncomplicated and similar between systems, shared libraries are implemented rather differently between different operating systems and pose somewhat of a management headache. 
    3635 
    37 (4) Cabal packages installed outside the system locations are static only. 
     36== Background reading == 
    3837 
    39 This is attractive, but there are some drawbacks: 
     38An excellent technical introduction to ELF shared libraries is [http://people.redhat.com/drepper/dsohowto.pdf How To Write Shared Libraries] by Ulrich Drepper (author of glibc). 
    4039 
    41  * we still need to build a distribution that uses shared libs.  Presumably we have to build both 
    42    shared and static libs then. 
    43  * the testsuite needs to build binaries against the shared libs for testing, without installing GHC. 
    44  * we want the GHC binary in a shared-library installation to be dynamically linked, not statically linked. 
    45  * if there are some static-only libraries on the system, then all packages must have static versions, 
    46    because dynamic linking is all-or-nothing in GHC. 
    47  * This approach doesn't address Windows 
     40== Why we care about shared libraries == 
    4841 
    49 == 2. Dynamic linking == 
     42There are several reasons we care. 
    5043 
    51 The first plan was this: 
     44The greatest advantage is that it enables us to make plugins for other programs. There are loads of examples of this, think of plugins for things like vim, gimp, postgres, apache. On Windows if you want to make a COM or .NET component then it usually has to be as a shared library (a .dll file). 
    5245 
    53 [http://www.haskell.org/pipermail/glasgow-haskell-users/2007-June/012740.html] 
     46Similar to plugins, shared libraries have become a common way of composing large systems. Each shared library can be written in a different language. Compared to static libraries, shared libraries are typically more self-contained. The ability to produce nice self-contained shared libraries from Haskell code would simply the integration of Haskell code into larger existing systems. 
    5447 
    55 It has since been pointed out that `LD_LIBRARY_PATH` overrides `-rpath` on some platforms (see below).  This might cause some difficulties (or not?). 
     48A somewhat superficial reason is that it makes your “Hello World” program much smaller because it doesn’t have to include a complete copy of the runtime system and half of the base library. It’s true that in most circumstances disk space is cheap, but if you’ve got some corporate shared storage that’s replicated and meticulously backed-up and if each of your 100 “small” Haskell plugins is actually 10MB big, then the disk space does not look quite so cheap. 
    5649 
    57 Assuming we can fix the locations of shared libraries at link time (eg. with -rpath), then: 
     50Using shared libraries also makes things a bit easier for Haskell applications that want to do dynamic code loading. For example GHCi itself currently has to load two copies of the base package, the one that is statically linked with and another copy that it loads dynamically. With shared libraries it would just end up with another reference to the same copy of the single shared base library. 
    5851 
    59  1. Installations of GHC outside the system default location hardwire the locations of shared libraries 
    60     into the binaries they build.  (hence such binaries cannot be distributed; this is a drawback) 
    61  2. Binaries in the distribution must not have rpaths.  We should use wrapper scripts that set 
    62     `LD_LIBRARY_PATH` instead. 
    63  3. Binaries in the build tree need `LD_LIBRARY_PATH` wrappers. 
    64  4. A Cabal package may install a shared library outside the standard location, but when linking to 
    65     it we must do the equivalent of adding -rpath to point to its location. 
     52Shared libs also completely eliminates the need for the “split objs” hack that GHC uses to reduce the size of statically linked programs. This should make our link times a bit quicker. 
    6653 
    67 !ToDo: Windows? 
     54Note that we have not mentioned the two major advantages that shared libraries were originally developed for, namely saving memory at runtime (when several programs use the same lib) and making it possible to upgrade libraries without touching the programs that use them. These advantages are more significant in core operating system libraries. C code can be made to follow a stable ABI where as historically this has not been a priority in Haskell implementations (though this may change). Similarly, there are not too many systems yet where having multiple copies of the RTS and base libraries in memory at once is a significant problem. Again, this may change if people choose to target memory-constrained systems. 
    6855 
    69 == 3. libtool == 
     56== TODO == 
    7057 
    71 libtool hides the building of shared and static libraries and executables behind a single simple command-line interface.  It hides the details of how to build executables against uninstalled shared libraries, and how to install those executables, on multiple platforms. 
     58More stuff to explain: 
    7259 
    73 When building an object file for a library, libtool builds both the PIC and non-PIC versions. 
    74  
    75 When building a library, libtool builds both the shared and static version, and remembers where the shared version will be installed later (you have to supply this path when building the library). 
    76  
    77 When building an executable against shared libraries, libtool creates an executable ready for installation (in `.libs`): this either has no paths embedded (if the shared libs are to be installed in system locations), or with appropriate `-rpath` settings pointing to the locations that the shared libs are to be installed. `libtool` also creates a script for running the program in-place.  The script relinks the executable against uninstalled shared libraries (using `-rpath` on Linux) on demand, caches the resulting executable in `.libs`. 
    78  
    79 = Platform support for locating shared libraries = 
    80  
    81 The following analysis is mostly from Reilly Hayes on the cvs-ghc mailing list. 
    82  
    83 == On Linux == 
    84  
    85 An ELF executable can have an embedded list of paths to search for dynamic libraries (the DT_RPATH entry).  This can be set by using -rpath with ld.  DT_RPATH is deprecated.  This list applies to all shared libraries used by the executable (it is not per shared library).  There is no default value placed in the DT_RPATH entry.  You must use -rpath to set it. 
    86  
    87 There is a new entry, DT_RUNPATH.  DT_RUNPATH works similarly to DT_RPATH.  However, when it is set, DT_RPATH is ignored.  DT_RUNPATH is also set using -rpath, but you must use the --enable-new-dtags switch as well.   
    88  
    89 When looking for a shared library, the dynamic linker(ld.so) checks the paths listed in DT_RPATH (unless DT_RUNPATH Is set) , the paths listed in the environment variable LD_LIBRARY_PATH, the paths listed in DT_RUNPATH, the libraries listed in /etc/ld.so.cache, and finally /usr/lib and /lib.  It checks in that order and takes the first library found.  At least on my linux box, LD_LIBRARY_PATH does NOT override the paths in DT_RPATH even though the documentation implies that it does.   LD_LIBRARY_PATH does override DT_RUNPATH. 
    90  
    91 You CAN override the search path embedded using DT_RPATH by using the LD_PRELOAD environment variable.  This variable contains a *whitespace-separated* list of libraries (not directories to search) to load prior to the search process.  The listed libraries are loaded whether or not they are needed to resolve a dependency in the executable. 
    92  
    93 Finally, an ELF shared library can also have a DT_RPATH entry.  This only impacts the search for shared libraries that are dependencies of the shared library and not the executable.  As with the DT_RPATH entry in an ELF executable, this is not overridden by LD_LIBRARY_PATH but can be overridden using LD_PRELOAD as above.   
    94  
    95 == On Mac OS X == 
    96  
    97 A Mach-O executable can embed the full path name for each shared library (as well as rules for acceptable substitutes).  This is called the "install name" for the library and it is included by default when building an executable.  The install name for the library is NOT based on where the static linker (ld) found the library when the executable was built.  The static linker (ld) extracts the install name from the shared library when building the executable.  The install name of the shared library is set when building the shared library.  When you build a shared library you should know where the library is going to be installed so that the install name is set correctly. 
    98  
    99 When looking for shared libraries, the dynamic linker (dyld) first scans the directories in DYLD_LIBRARY_PATH, then checks the location in the install name (which is per library), and finally checks the standard locations. 
    100  
    101 DYLD_LIBRARY_PATH successfully overrides the the path embedded in the executable. 
    102  
    103 Caveat 1: LD_LIBRARY_PATH has no runtime impact, but it does impact where the static linker looks for share libraries.  It looks first in the directories specified using -L, the the directories in LD_LIBRARY_PATH, and finally in /lib, /usr/lib, & /usr/local/lib.  This is particularly confusing  because many configure scripts seem to ignore LD_LIBRARY_PATH and you can get inconsistent results from configure and gcc/ld on whether a library is present. 
    104  
    105 Caveat 2: Mac OS X has a set of compiler/linker switches for dealing with Frameworks (packages of shared libraries and include files).  These are installed outside the typical *nix directory structure.  These switches act like -I (to gcc) and -L (to ld).  If you end up totally confused about where to find something, read up on this.  The OpenGL and OpenAL headers and libraries are in Frameworks, for example. 
    106  
    107 == On Windows == 
    108  
    109 ToDo: link to the MSDN page about how DLLs are found, and the details about manifests.  Manifests provide a way to do rpath-like things, I think. 
    110  
    111 == Conclusions == 
    112  
    113 For Mac: The -rpath switch is not available on Mac OS X because it is superfluous.  The default behavior of embedding a location for each individual shared library is at least as good.  Cabal (and the GHC build process) should use their knowledge of the ultimate install location to set the install name when shared libraries are built.  In-place compilation can override this with DYLD_LIBRARY_PATH 
    114  
    115 For Linux: On linux, we should be sure to use the --enable-new-dtags switch if we use -rpath.  Otherwise we risk having paths that can't be overridden by LD_LIBRARY_PATH. 
    116  
     60 * Position independent code, what it is, why we need it on some systems 
     61 * relationship between ghc flags -dynamic, -shared and -fPIC 
     62 * difference between C and ghc in compiling for shared libs 
     63 * peculiarities of ELF and PE