Changes between Version 26 and Version 27 of Commentary/GSoCMultipleInstances


Ignore:
Timestamp:
Jun 19, 2012 1:45:42 PM (22 months ago)
Author:
kosmikus
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Commentary/GSoCMultipleInstances

    v26 v27  
    22== Introduction == 
    33 
    4 Cabal and GHC do not support multiple instances of the same package version installed at the same time. If a second instance of a package version is installed it is overwritten on the file system as well as in the PackageDB. This causes packages that depended upon the overwritten instance to break. The idea is to never overwrite an installed package. As already discussed in [http://hackage.haskell.org/trac/ghc/wiki/Commentary/Packages/MultiInstances] the following changes need to be made: 
     4Cabal and GHC do not support multiple instances of the same package version installed at the same time. If a second instance of a package version is installed it is overwritten on the file system as well as in the `PackageDB`. This causes packages that depended upon the overwritten instance to break. The idea is to never overwrite an installed package. As already discussed in [http://hackage.haskell.org/trac/ghc/wiki/Commentary/Packages/MultiInstances] the following changes need to be made: 
    55 
    66 * Cabal should install packages to a location that does not just depend on name and version, 
    7  * ghc-pkg should always add instances to the PackageDB and never overwrite them, 
    8  * ghc --make, ghci, and the configure phase of Cabal should select suitable instances according to some rule of thumb (similar to the current resolution technique), 
     7 * `ghc-pkg` should always add instances to the PackageDB and never overwrite them, 
     8 * `ghc --make`, `ghci`, and the configure phase of Cabal should select suitable instances according to some rule of thumb (similar to the current resolution technique), 
    99 * we want to be able to make more fine-grained distinctions between package instances than currently possible, for example by distinguishing different build flavours or "ways" (profiling, etc.) 
    10  * cabal-install should still find an InstallPlan, and still avoid unnecessarily rebuilding packages whenever it makes sense 
     10 * `cabal-install` should still find an InstallPlan, and still avoid unnecessarily rebuilding packages whenever it makes sense 
    1111 * some form of garbage collection should be offered to have a chance to reduce the amount of installed packages 
    1212 
    1313== Install location of installed Cabal packages == 
    1414 
    15 Currently the library part of packages is installed to $prefix/lib/$pkgid/$compiler. For example the GLUT package of version 2.3.0.0 when compiled with GHC 7.4.1 when installed globally lands in /usr/local/lib/GLUT-2.3.0.0/ghc-7.4.1/. This is the default path. It is completely customizable by the user. In order to allow multiple instances of this package to coexist we need to change the install location to a path that is unique for each instance. Several ways to accomplish this have been discussed: 
     15Currently the library part of packages is installed to `$prefix/lib/$pkgid/$compiler`. For example the `GLUT` package of version 2.3.0.0 when compiled with GHC 7.4.1 when installed globally lands in `/usr/local/lib/GLUT-2.3.0.0/ghc-7.4.1/`. This is the default path. It is completely customizable by the user. In order to allow multiple instances of this package to coexist we need to change the install location to a path that is unique for each instance. Several ways to accomplish this have been discussed: 
    1616 
    17 1. Use a hash to uniquely identify package instances and make the hash part of both the InstalledPackageId and the installation path. 
     17=== Hash === 
     18 
     19Use a hash to uniquely identify package instances and make the hash part of both the InstalledPackageId and the installation path. 
    1820 
    1921The ABI hash currently being used by GHC is not suitable for unique identification of a package, because it is nondeterministic and not necessarily unique. In contrast, the proposed Cabal hash should be based on all the information needed to build a package. 
     
    2224there is a data directory (per default under $prefix/share/$pkgid/) that is baked into Paths_foo.hs in preparation of the build process. 
    2325 
    24 2. Use a unique number as part of the installation path. 
     26=== Unique number === 
     27 
     28Use a unique number as part of the installation path. 
    2529 
    2630A unique number could be the number of packages installed, or the number of instances of this package version already installed, or a random number. It is important that the numbers are guaranteed to be unique system-wide, so the counter-based approaches are somewhat tricky. 
     
    2832The advantage over using a hash is that this approach should be very simple to implement. On the other hand, identifying installed packages (see below) could possibly become more difficult, and migrating packages to other systems is only possible if the chance of collisions is reasonably low (for example, if random numbers are being used). 
    2933 
    30 2a. The unique number is also part of the installed package id. 
     34  1. The unique number is also part of the installed package id. 
    3135 
    32 2b. We can use another unique identifier (for example, a Cabal hash) to identify installed packages. In this case, that identifier would be allowed to depend on the output of a package build. 
     36  2. We can use another unique identifier (for example, a Cabal hash) to identify installed packages. In this case, that identifier would be allowed to depend on the output of a package build. 
    3337 
    34 == ghc-pkg == 
     38== `ghc-pkg` == 
    3539 
    36 ghc-pkg currently identifies each package by means of an InstalledPackageId. At the moment, this id has to be unique per package DB and is thereby limiting the amount of package instances that can be installed in a single package DB at one point in time. 
     40`ghc-pkg` currently identifies each package by means of an `InstalledPackageId`. At the moment, this id has to be unique per package DB and is thereby limiting the amount of package instances that can be installed in a single package DB at one point in time. 
    3741 
    38 In the future, we want the InstalledPackageId to still uniquely identify installed packages, but in addition to be unique among all package instances that could possibly be installed on a system. There's still the option that one InstalledPackageId occurs in several package DBs at the same time, but in this case, the associated packages should really be completely interchangeable. 
     42In the future, we want the `InstalledPackageId` to still uniquely identify installed packages, but in addition to be unique among all package instances that could possibly be installed on a system. There's still the option that one InstalledPackageId occurs in several package DBs at the same time, but in this case, the associated packages should really be completely interchangeable. [If we want to be strict about this, we'd have to include the ABI hash in the `InstalledPackageId`.] 
    3943 
    40 Even though, as discussed above, the ABI hash is not suitable for use as the InstalledPackageId given these changed requirements, we will need to keep the ABI hash as an essential piece of information for ghc itself. 
     44Even though, as discussed above, the ABI hash is not suitable for use as the `InstalledPackageId` given these changed requirements, we will need to keep the ABI hash as an essential piece of information for ghc itself. 
    4145 
    42 ghc-pkg is responsible for storing all information we have about installed packages. Depending on design decisions about the solver and the Cabal hash, further information may be required in ghc-pkg's description format (see below). 
     46`ghc-pkg` is responsible for storing all information we have about installed packages. Depending on design decisions about the solver and the Cabal hash, further information may be required in `ghc-pkg`'s description format (see below). 
    4347 
    4448== Simplistic dependency resolution == 
    4549 
    46 The best tool for determining suitable package instances to use as build inputs is cabal-install. However, in practice there will be many situations where users will probably not have the full cabal-install functionality available: 
     50The best tool for determining suitable package instances to use as build inputs is `cabal-install`. However, in practice there will be many situations where users will probably not have the full `cabal-install` functionality available: 
    4751 
    4852  1. invoking GHCi from the command line, 
    4953  2. invoking GHC directly from the command line, 
    50   3. invoking the configure phase of Cabal (without using cabal-install). 
     54  3. invoking the configure phase of Cabal (without using `cabal-install`). 
    5155 
    5256In these cases, we have to come up with a suitable selection of package instances, and the only info we have available are the package DBs plus potential command line flags. Cabal will additionally take into account the local constraints of the package it is being invoked for, whereas GHC will only consider command-line flags, but not modules it has been invoked with. 
    5357 
    54 Currently if GHC is invoked by the user it does some adhoc form of dependency resolution. The most common case of this is using ghci. If there are multiple instances of the same package in the PackageDBStack the policy used to select a single one prefers DBs higher in the stack. It then prefers packages with a higher version. Once we allow package instances with the same version within a single package DB, we need to refine the algorithm. Options are: 
     58Currently if GHC is invoked by the user it does some adhoc form of dependency resolution. The most common case of this is using ghci. If there are multiple instances of the same package in the `PackageDBStack` the policy used to select a single one prefers DBs higher in the stack. It then prefers packages with a higher version. Once we allow package instances with the same version within a single package DB, we need to refine the algorithm. Options are: 
    5559 
    5660 * pick a random / unspecified instances 
    5761 * use the time of installation 
    5862 * user-specified priorities 
    59  * use the order in the PackageDB 
     63 * use the order in the `PackageDB` 
    6064 * look at the transitive closure of dependencies and their versions 
    6165 * build a complex solver into GHC 
    6266 
    63 Picking a random version is a last resort. A combination of installation time and priorities seems rather feasible. It makes conflicts unlikely, and allows to persistently change the priorities of installed packages. Using the order in the package DB is difficult if directories are being used as DBs. Looking at the transitive closure of dependencies makes it hard to define a total ordering of package instances. Adding a complex solver is unattractive unless we find a way to reuse cabal-install's functionality within GHC, but probably we do not want to tie the two projects together in this way. 
     67Picking a random version is a last resort. A combination of installation time and priorities seems rather feasible. It makes conflicts unlikely, and allows to persistently change the priorities of installed packages. Using the order in the package DB is difficult if directories are being used as DBs. Looking at the transitive closure of dependencies makes it hard to define a total ordering of package instances. Adding a complex solver is unattractive unless we find a way to reuse `cabal-install`'s functionality within GHC, but probably we do not want to tie the two projects together in this way. 
    6468 
    6569== Build flavours == 
     
    6973The minimal approach would be to just take the transitive dependencies into account. However, we might also want to include additional information about builds such as Cabal flag settings, compiler options, profiling, documentation, build tool versions, external (OS) dependencies, and more. 
    7074 
    71 These differences have to be tracked. The two options we discuss are to store information in the ghc-pkg format, or to incorporate them in a Cabal hash (which is then stored). Both options can be combined. 
     75These differences have to be tracked. The two options we discuss are to store information in the `ghc-pkg` format, or to incorporate them in a Cabal hash (which is then stored). Both options can be combined. 
    7276 
    7377=== The Cabal hash === 
     
    7781A build configuration consists of the following: 
    7882 
    79 The Cabal hashes of all the package instances that are actually used for compilation. This is the environment. It is available in the installedPkgs field of LocalBuildInfo which is available in every step after configuration. It can also be extracted from an InstallPlan after dependency resolution. 
     83The Cabal hashes of all the package instances that are actually used for compilation. This is the environment. It is available in the `installedPkgs` field of `LocalBuildInfo` which is available in every step after configuration. It can also be extracted from an `InstallPlan` after dependency resolution. 
    8084 
    81 The compiler, its version and its arguments and the tools and their version and their arguments. Available from LocalBuildInfo also. More specifically: compiler, withPrograms, withVanillaLib, withProfLib, withSharedLib, withDynExe, withProfExe, withOptimization, withGHCiLib, splitObjs, stripExes. And a lot more. [Like what?] 
     85The compiler, its version and its arguments and the tools and their version and their arguments. Available from LocalBuildInfo also. More specifically: `compiler`, `withPrograms`, `withVanillaLib`, `withProfLib`, `withSharedLib`, `withDynExe`, `withProfExe`, `withOptimization`, `withGHCiLib`, `splitObjs`, `stripExes`. And a lot more. [Like what?] 
    8286 
    83 The source code. This is necessary because if the source code changes the result of compilation changes. For released packages i would assume that the version number uniquely identifies the source code. A hash of the source code should be available from hackage to avoid downloading the source code. For an unreleased package we need to find all the source files that are needed for building it. Including non-haskell source files. One way is to ask a source tarball to be built as if the package was released and then hash all the sources included in that. 
     87The source code. This is necessary because if the source code changes the result of compilation changes. For released packages I would assume that the version number uniquely identifies the source code. A hash of the source code should be available from hackage to avoid downloading the source code. For an unreleased package we need to find all the source files that are needed for building it. Including non-haskell source files. One way is to ask a source tarball to be built as if the package was released and then hash all the sources included in that. 
    8488 
    8589OS dependencies are not taken into account because i think it would be very hard. 
     
    99103Reusing installed packages instead of rebuilding them is then an optimization of the install plan. 
    100104 
    101 The agnostic way does not require ghc-pkg to be directly aware of all the build parameters, as long as the hash computation is robust 
     105The agnostic way does not require `ghc-pkg` to be directly aware of all the build parameters, as long as the hash computation is robust 
    102106 
    103 The options are to support either both by putting all info into InstalledPackageInfo or to support only the second option by just putting a hash into InstalledPackageInfo. The disadvantage of supporting both is that InstalledPackageInfo would have to change more often. This could be fixed by explicitly making the InstalledPackageInfo format extensible in a backwards-compatible way. 
     107The options are to support either both by putting all info into `InstalledPackageInfo` or to support only the second option by just putting a hash into `InstalledPackageInfo`. The disadvantage of supporting both is that `InstalledPackageInfo` would have to change more often. This could be fixed by explicitly making the `InstalledPackageInfo` format extensible in a backwards-compatible way. 
    104108 
    105109The advantages of having all info available, independently of the solver algorihm, are that the info might be useful for other tools and user feedback.  
     
    125129=== Separating storage and selection of packages === 
    126130 
    127 Currently the two concepts of storing package instances (cabal store) and selecting package instances for building (environment) are conflated into a PackageDB. Sandboxes are used as a workaround to create multiple different environments. But they also create multiple places to store installed packages. The disadvantages of this are disk usage, compilation time and one might lose the overview. Also if the multi-instance restriction is not lifted sandboxes will eventually suffer from the same unintended breakage of packages as non-sandboxed PackageDBs. 
     131Currently the two concepts of storing package instances (cabal store) and selecting package instances for building (environment) are conflated into a `PackageDB`. Sandboxes are used as a workaround to create multiple different environments. But they also create multiple places to store installed packages. The disadvantages of this are disk usage, compilation time and one might lose the overview. Also if the multi-instance restriction is not lifted sandboxes will eventually suffer from the same unintended breakage of packages as non-sandboxed `PackageDB`s. 
    128132There should be a separation between the set of all installed packages called the cabal store and a subset of these called an environment. While the cabal store can contain multiple instances of the same package version an environment needs to be consistent. An environment is consistent if for every package version it contains only one instance of that package version. 
    129133