Changes between Version 25 and Version 26 of Commentary/GSoCMultipleInstances


Ignore:
Timestamp:
Jun 19, 2012 1:35:46 PM (22 months ago)
Author:
kosmikus
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Commentary/GSoCMultipleInstances

    v25 v26  
    6969The minimal approach would be to just take the transitive dependencies into account. However, we might also want to include additional information about builds such as Cabal flag settings, compiler options, profiling, documentation, build tool versions, external (OS) dependencies, and more. 
    7070 
     71These differences have to be tracked. The two options we discuss are to store information in the ghc-pkg format, or to incorporate them in a Cabal hash (which is then stored). Both options can be combined. 
     72 
    7173=== The Cabal hash === 
    7274 
    73 We hash the build configuration of a package that is to be built. cabal-install uses this hash to check if a package is already installed. 
     75[A few notes about where to find suitable information in the source code:] 
    7476 
    7577A build configuration consists of the following: 
     
    8587=== Released and Unreleased packages === 
    8688 
    87 If we cabal install a package that is released on hackage we call this a clean install. If we cabal install an unreleased package we call this a dirty install. Clean installs are mainly used to bring a package into scope for ghci and to install applications. While they can be used to satisfy dependencies this is discouraged. For released packages the set of source files needed for compilation is known. For unreleased packages this is currently not the case. 
    88  
     89If we cabal install a package that is released on hackage we call this a '''clean install'''. If we cabal install an unreleased package we call this a '''dirty install'''. Clean installs are mainly used to bring a package into scope for ghci and to install applications. While they can be used to satisfy dependencies this is discouraged. For released packages the set of source files needed for compilation is known. For unreleased packages this is currently not the case. 
    8990 
    9091 
    9192== Dependency resolution in cabal-install == 
    9293 
    93  
    9494There are two general options for communicating knowledge about build flavors to the solver: 
    9595 
    96 (1) "the direct way":  
    97 i.e., all info is available to ghc-pkg and can be communicated back to Cabal and therefore the solver 
    98 the solver can therefore figure out if a particular package is suitable to use or not, in advance 
     96  1. '''the direct way''': i.e., all info is available to ghc-pkg and can be communicated back to Cabal and therefore the solver can figure out if a particular package is suitable to use or not, in advance; 
    9997 
    100 (2) "the agnostic way" 
    101 this is based on the idea that the solver at first doesn't consider installed packages at all. it'll just do resolution on the source packages available. 
    102 taking all build parameters into account, Cabal hashes will be computed. 
    103 these can then be compared to hashes of installed packages. 
    104 reusing installed packages instead of rebuilding them is then an optimization of the install plan. 
    105 this doesn't require that ghc-pkg is actually directly aware of all the build parameters, as long as the hash computation is robust." -- kosmikus 
     98  2. '''the agnostic way''': this is based on the idea that the solver at first doesn't consider installed packages at all. It'll just do resolution on the source packages available. Then, taking all build parameters into account, Cabal hashes will be computed, which can then be compared to hashes of installed packages. 
     99Reusing installed packages instead of rebuilding them is then an optimization of the install plan. 
    106100 
    107 The options are to support either both by putting all info into InstalledPackageInfo or to support only (2) by just putting a hash into InstalledPackageInfo. The disadvantage of supporting both is that InstalledPackageInfo would have to change more often. This could be fixed by making InstalledPackageInfo extensible. The advantages are that the additional info might be useful for other tools and that more complex rules for compatibility are possible for example non-profiling libs can depend on profiling libs. It would also be better for showing the user how two instances differ. The disadvantage of going for only (2) is that it is a big change and might cause problems with other Haskell implementations. Also if a package only exists installed and not in source form it is completely ignored.  
     101The agnostic way does not require ghc-pkg to be directly aware of all the build parameters, as long as the hash computation is robust 
     102 
     103The options are to support either both by putting all info into InstalledPackageInfo or to support only the second option by just putting a hash into InstalledPackageInfo. The disadvantage of supporting both is that InstalledPackageInfo would have to change more often. This could be fixed by explicitly making the InstalledPackageInfo format extensible in a backwards-compatible way. 
     104 
     105The advantages of having all info available, independently of the solver algorihm, are that the info might be useful for other tools and user feedback.  
     106 
     107Possible disadvantages of the agnostic approach could be that is is a rather significant change and can probably not be supported in a similar way for other Haskell implementation. Also, in the direct approach, we could in principle allow more complex compatibility rules, such as allowing non-profiling libraries to depend on profiling libraries. 
     108 
     109Also, even if we go for the agnostic approach, we still have to be able to handle packages such as base or ghc-prim which are in general not even available in source form. 
     110 
     111On the other hand, the agnostic approach might lead to more predictable and reproducible solver results across many different systems. 
    108112 
    109113== Garbage Collection == 
    110114 
    111 It should be possible to have a garbage collection remove unneeded packages. It has to be interactive because there might be dependencies not known to Cabal and ghc-pkg. Sandboxes are useful for the user to keep track of what should be removable without causing too much damage. 
     115The proposed changes will likely lead to a dramatic increase of the number of installed package instances on most systems. This is particularly relevant for package developers who will conduct lots of dirty builds that lead to new instances being installed all the time. 
    112116 
     117It should therefore be possible to have a garbage collection to remove unneeded packages. However, it is not possible for Cabal to see all potential reverse dependencies of a package, so automatic garbage collection would be extremely unsafe. 
    113118 
    114 == Separating storage and selection of packages == 
     119Options are to either offer an interactive process where packages that look unused are suggested for removal, or to integrate with a sandbox mechanism. If, for example, dirty builds are usually installed into a separate package DB, that package DB could just be removed completely by a user from time to time. 
     120 
     121== Related topics == 
     122 
     123In the following, we discuss some other issues which are related to the multi-instance problem, but not necessarily directly relevant in order to produce an implementation. 
     124 
     125=== Separating storage and selection of packages === 
    115126 
    116127Currently the two concepts of storing package instances (cabal store) and selecting package instances for building (environment) are conflated into a PackageDB. Sandboxes are used as a workaround to create multiple different environments. But they also create multiple places to store installed packages. The disadvantages of this are disk usage, compilation time and one might lose the overview. Also if the multi-instance restriction is not lifted sandboxes will eventually suffer from the same unintended breakage of packages as non-sandboxed PackageDBs. 
    117128There should be a separation between the set of all installed packages called the cabal store and a subset of these called an environment. While the cabal store can contain multiple instances of the same package version an environment needs to be consistent. An environment is consistent if for every package version it contains only one instance of that package version. 
    118129 
    119 == First class environments == 
     130=== First class environments === 
    120131 
    121132It would be nice if we had some explicit notion of an environment. 
    122133 
    123 == Other == 
    124  
    125 The ABI hash becomes a field of InstalledPackageInfo. Some code in GHC needs to be adjusted to use this new field instead. [You mean this is a change in behaviour? What about packages that don't have one?] 
     134== Questions to remember == 
    126135 
    127136What about builtin packages like ghc-prim, base, rts and so on? 
    128  
    129 This assumption does not hold in a multi user environment with multiple local package databases. This is a problem when building. 
    130  
    131 == Open Questions == 
    132137 
    133138Inplace Registration?