Changes between Version 20 and Version 21 of Commentary/GSoCMultipleInstances


Ignore:
Timestamp:
Jun 8, 2012 1:57:41 PM (23 months ago)
Author:
phischu
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Commentary/GSoCMultipleInstances

    v20 v21  
     1 
     2== Introduction == 
     3 
     4Cabal and GHC do not support multiple instances of the same package version installed at the same time. If a second instance of a package version is installed it is overwritten on the file system as well as in the PackageDB. This causes packages that depended upon the overwritten instance to break. The idea is to never overwrite an installed package. As already discussed in [http://hackage.haskell.org/trac/ghc/wiki/Commentary/Packages/MultiInstances](this wiki entry) the following changes need to be made: 
     5 
     6 * Cabal should install packages to a different location 
     7 * ghc-pkg should add instances to the PackageDB instead of overwriting them 
     8 * ghc --make and ghci should shadow the additional instances according to some rule of thumb 
     9 * cabal-install should still find an InstallPlan 
     10 * some form of garbage collection should be invented 
    111 
    212== Changing the Install Location == 
    313 
    4 Cabal does not support multiple instances of the same package version installed at the same time. Instead of installing them next to each other Cabal overwrites the previous instance with the same version. This causes packages that depended upon the overwritten instance to break. A solution is to never overwrite an installed package. 
     14Cabal does not support multiple instances of the same package version installed at the same time. Instead of installing them next to each other Cabal overwrites the previous instance with the same version. 
    515 
    6 Currently the library part of packages is installed to $prefix/lib/$pkgid/$compiler. For example the GLUT package of version 2.3.0.0 when compiled with GHC 7.4.1 when installed globally lands in /usr/local/lib/GLUT-2.3.0.0/ghc-7.4.1/. This is the default path. It is completely customizable by the user. In order to allow multiple instances of this package to coexist we need to change the install location to a path that is unique for each instance of it. Several ways to accomplish this have been discussed: 
     16Currently the library part of packages is installed to $prefix/lib/$pkgid/$compiler. For example the GLUT package of version 2.3.0.0 when compiled with GHC 7.4.1 when installed globally lands in /usr/local/lib/GLUT-2.3.0.0/ghc-7.4.1/. This is the default path. It is completely customizable by the user. In order to allow multiple instances of this package to coexist we need to change the install location to a path that is unique for each instance. Several ways to accomplish this have been discussed: 
    717 
    8181. The InstalledPackageId is part of the path: 
     
    1020 
    11212. A Cabal hash is part of the path: 
    12 We want to have Cabal compute a hash of all the information needed to build a package. We also want to avoid rebuilding a package if a package with the same hash is already present. Because of this there should only ever be one installed package with a certain Cabal hash on a machine so the Cabal hash would be a good choice to be part of the path. This assumption does not hold in a multi user environment with multiple local package databases. This is a problem when building. It does not violate the uniqueness of the installation path. 
     22We want to have Cabal compute a hash of all the information needed to build a package. We also want to avoid rebuilding a package if a package with the same hash is already present. Because of this there should only ever be one installed package with a certain Cabal hash on a machine so the Cabal hash would be a good choice to be part of the path. 
    1323 
    14243. A unique number is part of the path: 
    1525A unique number could be the number of packages installed for example /usr/local/lib/GLUT-2.3.0.0-87 or the number of instances of this version installed for example /usr/local/lib/GLUT-2.3.0.0-2 or a random number for example /usr/local/lib/GLUT-2.3.0.0-83948393212. The advantages I see are that not much information is needed to come up with the file path and this seems to be robust against other design decisions we make now or in the future. 
    1626 
     27== Simple dependency resolution in Cabal and GHC == 
     28 
     29Currently if GHC is invoked by the user it does some adhoc form of dependency resolution. The most common case of this is using ghci. If there are multiple instances of the same package in the PackageDBStack the policy used to select a single one needs to be adjusted. The user should be warned that this happened. Ideas: 
     30 
     31 * build a complex solver into GHC 
     32 * random 
     33 * dependencies with the highest versions 
     34 * order in the PackageDB 
     35 * latest 
     36 
     37Picking the most recently installed instance seems like the best idea right now. There are at least two ways to track which of the installed instances was most recently installed. In either you add a timestamp or the count of instances to InstalledPackageInfo. Tracking the count means that you would lose the possibility to migrate packages between machines. 
     38 
     39Currently if Cabal is asked to configure a package from a Setup.hs script without using cabal-install some adhoc dependency resolution takes place too. 
     40 
     41== Garbage Collection == 
     42 
     43It should be possible to have a garbage collection remove unneeded packages. It has to be interactive because there might be dependencies not known to Cabal and ghc-pkg. Sandboxes come in handy because they can be removed without affecting anything else. 
     44 
    1745== Identifying packages == 
    1846 
    19 The InstalledPackageId uniquely identifies an installed package. It currently consists of package-version-abihash for example GLUT-2.3.0.0-70c7b988404c00401d762b8eca475e5c. The ABI hash is used to discriminate instances of the same package version, to avoid recompilation if a dependency has changed but its ABI has not and as a sanity check to refuse compilation rather than to produce garbage. 
    20 We want multiple instances of the same package version that of course expose the same API but are built against different other packages to be installed at the same time. Although it is currently very likely that those have different ABI hashes this is not guaranteed. So the InstalledPackageId as currently defined is unsuitable to uniquely identify all installed package instances. 
    21  
    22 == Separating storage and selection of packages == 
    23  
    24 Currently the two concepts of storing package instances (cabal store) and selecting package instances for building (environment) are conflated into a PackageDB. Sandboxes are used as a workaround to create multiple different environments. But they also create multiple places to store installed packages. The disadvantages of this are disk usage, compilation time and one might lose the overview. Also if the multi-instance restriction is not lifted sandboxes will eventually suffer from the same unintended breakage of packages as non-sandboxed PackageDBs. 
    25 There should be a separation between the set of all installed packages called the cabal store and a subset of these called an environment. While the cabal store can contain multiple instances of the same package version an environment needs to be consistent. An environment is consistent if for every package version it contains only one instance of that package version. 
     47The InstalledPackageId currently uniquely identifies an installed package and should to so in the future. It currently consists of the package name, the version and the abihash, for example GLUT-2.3.0.0-70c7b988404c00401d762b8eca475e5c. The ABI hash is used to discriminate between instances of the same package version, to avoid recompilation if a dependency has changed but its ABI has not and as a sanity check for GHC to refuse compilation rather than to produce garbage. 
     48The InstalledPackageId as currently defined is unsuitable to uniquely identify installed package instances. 
    2649 
    2750== Avoiding rebuilding a package == 
    2851 
    29 Building a package is a function that maps a build configuration to a built package. Installing a package should mean memoizing this function to avoid rebuilding this package if possible. Currently only a small part of the configuration of an installed package is stored and only this small part can be used to determine if it is valid to depend upon a certain installed package. For example it is not tracked if a package was built with profiling support. We could enrich the InstalledPackageInfo with a lot of information about the package configuration. Or we could hash the package configuration and only add this hash to the InstalledPackageInfo. 
     52Currently only a small part of the configuration of an installed package is stored and only this small part can be used to determine if it is valid to depend upon a certain installed package. For example it is not tracked if a package was built with profiling support. We could enrich the InstalledPackageInfo with a lot of information about the package configuration. Or we could hash the package configuration and only add this hash to the InstalledPackageInfo. 
    3053 
    31 == The cabal hash == 
     54== The Cabal hash == 
    3255 
    3356We hash the build configuration of a package that is to be built. cabal-install uses this hash to check if a package is already installed. 
     
    3558A build configuration consists of the following: 
    3659 
    37 The hashes of all the package instances that are actually used for compilation. This is the environment. It is available in the installedPkgs field of LocalBuildInfo which is available in every step after configuration. It can also be extracted from an InstallPlan after dependency resolution. 
     60The Cabal hashes of all the package instances that are actually used for compilation. This is the environment. It is available in the installedPkgs field of LocalBuildInfo which is available in every step after configuration. It can also be extracted from an InstallPlan after dependency resolution. 
    3861 
    3962The compiler, its version and its arguments and the tools and their version and their arguments. Available from LocalBuildInfo also. More specifically: compiler, withPrograms, withVanillaLib, withProfLib, withSharedLib, withDynExe, withProfExe, withOptimization, withGHCiLib, splitObjs, stripExes. And a lot more. [Like what?] 
    4063 
    4164The source code. This is necessary because if the source code changes the result of compilation changes. For released packages i would assume that the version number uniquely identifies the source code. A hash of the source code should be available from hackage to avoid downloading the source code. For an unreleased package we need to find all the source files that are needed for building it. Including non-haskell source files. One way is to ask a source tarball to be built as if the package was released and then hash all the sources included in that. 
    42  
    43 Can a dirty install ever have the same hash as a clean install? No, because if it had it would have to use the same source code. But this source code is released so by definition it would be a clean install. Even if the source code was downloaded manually and cabal install was invoked in that directory. 
    4465 
    4566OS dependencies are not taken into account because i think it would be very hard. 
     
    6182One drawback of ignoring installed packages is that there might be installed packages for which the source is not available. If the source is not available and installed packages are ignored those packages can not appear in the install plan. 
    6283 
    63 == Using Cabal without cabal-install == 
    64  
    65 Currently if Cabal is asked to configure a package from a Setup.hs script without using cabal-install some adhoc dependency resolution that only takes into account the installed packages takes place. It is a dependency resolution because it takes the set of installed packages and creates an environment. After cabal-install figures out the install plan we will want to supply this step with a specific environment to build against. 
    66  
    67 == Using GHC without Cabal and cabal-install == 
    68  
    69 Currently if GHC is invoked by the user it does some adhoc form of dependency resolution. The most common case of this is using ghci. If there are multiple instances of the same package in the PackageDBStack the policy used for this needs to be adjusted. 
    70  
    71 == Inplace Registration == 
    72  
    73 We try to never overwrite the files of an installed package. In the case of inplace registration this is impossible because the overwriting has already happened. I feel that inplace registration should be discouraged. 
    74  
    7584== Released and Unreleased packages == 
    7685 
    7786If we cabal install a package that is released on hackage we call this a clean install. If we cabal install an unreleased package we call this a dirty install. Clean installs are mainly used to bring a package into scope for ghci and to install applications. While they can be used to satisfy dependencies this is discouraged. For released packages the set of source files needed for compilation is known. For unreleased packages this is currently not the case. 
    7887 
     88== Separating storage and selection of packages == 
    7989 
     90Currently the two concepts of storing package instances (cabal store) and selecting package instances for building (environment) are conflated into a PackageDB. Sandboxes are used as a workaround to create multiple different environments. But they also create multiple places to store installed packages. The disadvantages of this are disk usage, compilation time and one might lose the overview. Also if the multi-instance restriction is not lifted sandboxes will eventually suffer from the same unintended breakage of packages as non-sandboxed PackageDBs. 
     91There should be a separation between the set of all installed packages called the cabal store and a subset of these called an environment. While the cabal store can contain multiple instances of the same package version an environment needs to be consistent. An environment is consistent if for every package version it contains only one instance of that package version. 
     92 
     93== First class environments == 
     94 
     95It would be nice if we had some explicit notion of an environment. 
    8096 
    8197== Other == 
     
    85101What about builtin packages like ghc-prim, base, rts and so on? 
    86102 
     103This assumption does not hold in a multi user environment with multiple local package databases. This is a problem when building. 
     104 
    87105== Open Questions == 
     106 
     107Inplace Registration? 
    88108 
    89109Who has assumptions about the directory layout of installed packages? 
     
    92112 
    93113Haddock? 
    94  
    95 Garbage Collection of unused packages? 
    96114 
    97115Installation Planner?