|Version 2 (modified by 6 years ago) (diff),|
GHC Commentary: Packages
This section documents how GHC implements packages. You should also look at
- The Packages section of the Users Guide
- The Cabal documentation
Distribution.*modules in the Cabal package, eg. Distribution.Package.
A package consists of zero or more Haskell modules, compiled to object code and placed in a single library file
GHCi's linker can't load
.a files, so there is also a version of the
.a file linked together as a single object file, usually named
GHC draws its information about what packages are installed from one or more package databases. A package database is a file containing a value of type [ InstalledPackageInfo ], rendered as text via
show. Also, GHC allows the system package databases to be in the form of a directory of files, each of which contains a
[InstalledPackageInfo] (in the future this may be extended to allow all packages databases to have this form). Note: the exact representation of a package database is intended to be private to GHC, which is why we provide the
ghc-pkg tool to manipulate it.
The most important package type inside ghc is
PackageId, representing the full name of a package (including its version). It is represented as a
FastStringfor fast comparison.
The information contained in the package database about a package. Currently this is a synonym for
InstalledPackageInfo, later it might contain extra GHC-specific info, or have a more optimised representation.
A mapping (actually
Everything the compiler knows about the package database. This is built by
initPackagesin compiler/main/Packages.lhs, and stashed in the
GHC (from version 6.6) allows a single program to contain multiple modules with the same name, as long as the duplicates all come from different packages. In other words, the pair (package name, module name) must be unique within a program. GHC implements this with the Module type, which contains a
PackageId and the
ModuleName of a module. For any
Module, we can therefore ask which package it comes from.
This means that the
Module type is not
Uniqable, so we can't use
Module as the key in a
UniqFM, which is sad. We explored various schemes for extracting uniques from
Modules, but didn't find anything attractive enough. Another problem with the current scheme is that everytime we refer to a
Module in an interface file, it gives rise to two words in the binary representation. Our current plan is to improve the binary representation in
.hi files to mitigate this, but this is currently one reason why in GHC 6.6 interface files are larger than in 6.4.
Source code: compiler/basicTypes/Module.lhs.
The current package
There is a notion of which package we are compiling, set by the
-package-name flag on the command line. In the absence of a
-package-name flag, the default package
main is assumed.
To find out what the current package is, grab the field
DynFlags (see compiler/main/DynFlags.hs).
Certain packages are special, in the sense that GHC knows about their existence and something about their contents. Any Name that is wired-in? (see compiler/prelude/PrelNames.lhs) must by definition contain a
Module, and that module must therefore contain a
PackageId. But the
PackageId is a full package name, including the version, so does this mean we have to somehow find out the version of the
base package (for example) and bake it into the GHC binary?
We took the view that it should be possible to upgrade these packages independently of GHC, so long as you don't change any of the information that GHC knows about the package (eg. the type of
fromIntegral or what module things come from). Therefore we shouldn't bake in the version of any packages. So the
PackageId for the
base package inside GHC is simply
base: we explicitly strip off the version number for special packages wherever they occur.
This does have the consequence that you cannot use multiple versions of a special package simultaneously in a program, but we believe that is unlikely for these packages anyway. Another consequence is that symbol names for entities from special packages will not include the version number, which saves some space in the object files.
The following packages are special in GHC 6.6:
PackageIds are defined in compiler/main/PackageConfig.hs, and the stripping of versions from special packages in the package database happens in
initPackages in compiler/main/Packages.lhs.
All symbol names in the object code have the package name prepended (plus an underscore) so that modules of the same name from different packages do not clash. We assume the symbol namespace is global, which is the worst case - allegedly there are ways to have semi-private namespaces on some platforms but we haven't explored that.
There is one exception: we don't prepend
main_ to symbols from the main package, because there can only ever be one main package. This is a small optimisation.
Source code: see the
Outputable instance for
Module in compiler/basicTypes/Module.lhs.
Packages have another purpose when it comes to dynamic linking: each package is a single dynamically-linked library. This is an important property on systems where making intra-library calls is different from inter-library calls (eg. Windows DLLs). Even on systems where we only need to generate a single kind of call, making a data reference within a single library is cheaper than a data reference in another library, so knowing which is which is important.
At the time of writing (GHC 6.6) GHC doesn't have working support for generating multi-DLL Haskell programs, but it worked in the past and work is underway to resurrect it. Dynamic libraries currently only work on MacOS X/PowerPC.
Packages in a GHC build
When GHC is building, it constructs two package databases:
driver/package.conf: the package database that will be installed if you say
make install. To inspect or modify this database, use
utils/ghc-pkg/ghc-pkg-inplace -f <somewhere>/driver/package.conf.
driver/package.conf.inplace: the same, but paths points to the build tree so that GHC can be run without installing. To inspect or modify this database, use
Both of these databases start empty:
make boot in
driver creates an empty database in each file. Then, packages are registered into each database when
make boot runs in a package directory.
NOTE: packages must be registered in dependency order. The build system arranges this normally, but if you build parts of the tree by hand you might violate this rule. If a package is registered before its dependencies, you might not get an error message, but something will go wrong later (probably a missing package dependency). The reason is, to make it easier to register packages, we don't specify full version numbers in the
depends field of a package configuration, leaving
ghc-pkg to fill it in from the database, but if the dependency isn't present in the database,
ghc-pkg silently registers it anyway (because we use
--force... that's another story).
Refreshing your package databases
Sometimes things can get out of sync in your build tree, if a package version was bumped for example. If you get into trouble, just
make clean in your tree.