Changes between Version 17 and Version 18 of Commentary/GSoCMultipleInstances


Ignore:
Timestamp:
May 24, 2012 11:59:51 PM (23 months ago)
Author:
phischu
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Commentary/GSoCMultipleInstances

    v17 v18  
    22== Overview == 
    33 
    4 It is a problem that cabal does not support multiple instances of the same package version installed at the same time. [Mainly ghc, not cabal. Cabal's non-support is a consequence.] Instead of installing them next to each other it overwrites the previous instance. This causes packages that depended upon the overwritten instance to break. The solution is to never overwrite an installed package. In the case of inplace registrations the overwriting has already taken place which is a problem. [?] 
     4It is a problem that cabal and ghc do not support multiple instances of the same package version installed at the same time. This is called the multi-instace restriction. Instead of installing them next to each other cabal overwrites the previous instance. This causes packages that depended upon the overwritten instance to break. The solution is to never overwrite an installed package. 
    55 
    6 Relating this to how Nix works. Cabal stores potentially every instance of every package possible. Lets call this the cabal store. [Are you talking about Nix or about Cabal? Does the store really contain all packages, even packages not ever built?] There might at least be a global and a local one. [Might be?] Shadowing doesn't matter because if two packages have the same hash they should be interchangeable. [Define shadowing.] 
     6Building a package is a function that maps a build configuration to a built package. Installing a package should mean memoizing this function to avoid rebuilding this package if possible. There should be a separation between the set of all installed packages called the cabal store and a subset of these called an environment. While the cabal store can contain multiple instances of the same package version an environment needs to be consistent. An environment is consistent if for every package it contains all dependencies which target that package have the same version. 
    77 
    8 The dependency resolver comes up with an install plan. [Is this part of the current situation or part of the solution?] In this install plan all packages have completely fixed dependencies based on the dependencies specified in the cabal file. Same of them are already present in the cabal store and some aren't. They are a subset of all possible package instances. This corresponds to a profile in Nix as well as a sandbox. We call this an environment. [I still don't get this definition of environment. In particular, I fail to see how an environment is related to the solver.] 
     8Currently the two concepts of storing package instances (cabal store) and selecting package instances for building (environment) are conflated into a PackageDB. Sandboxes are used as a workaround to create multiple different environments. But they also create multiple places to store installed packages. The disadvantages of this are disk usage, compilation time and one might lose the overview. Also if the multi-instance restriction is not lifted sandboxes will eventually suffer from the same unintended breakage of packages as non-sandboxed PackageDBs. 
    99 
    1010== Dependency resolution == 
    1111 
    12 The dependency resolver takes into account which packages are already installed and tries to reuse them. [Is this part of the current situation or part of the solution?] Another option would be for the resolver to ignore which packages are already installed. It then computes the hashes for the packages it needs for compilation. Then those that aren't already present in the cabal store are built. [Tradeoffs?] 
     12The dependency resolver currently comes up with an install plan. An install plan is similar to an environment. Like an environment it is a set of installed packages. But it may also contain configurations for packages that are not installed yet. Like and environment it needs to be consistent. 
     13 
     14"An installation plan is a set of packages that are going to be used together. It will consist of a mixture of installed packages and source packages along with their exact version dependencies." -- InstallPlan documentation 
     15 
     16The dependencies of a package version are what is currently listed under dependencies in a cabal file. A list of packages with version contraints that may be used to build this package version. 
     17 
     18There should be at least two modes for dependency resolution. 
     19 
     20The dependency resolver uses the dependencies of all possible source packages to find a set of package configurations. This is already an install plan. It then in this set replaces configurations for already installed packages by the installed package to make it more efficient. 
     21 
     22The other mode is what is currently done. The set of installed packages is taken into account. This might avoid more rebuilding but people might not always get the latest packages. 
     23 
     24Also there might be installed packages for which the source is not available. If the source is not available and installed packages are ignored those packages can not appear in the install plan. 
     25 
     26== Using Cabal without cabal-install == 
     27 
     28Currently if Cabal is asked to configure a package from a Setup.hs script without using cabal-install some adhoc dependency resolution that only takes into account the installed packages takes place. It is a dependency resolution because it takes the set of installed packages and creates an environment. If cabal-install figures out the environment we will want to supply this step with it. 
     29 
     30== Using GHC without Cabal and cabal-install == 
     31 
     32Currently if GHC is invoked by the user it does some adhoc form of dependency resolution. The most common case of this is using ghci. If there are multiple instances of the same package in the PackageDBStack this needs to be adjusted. 
     33 
     34== Inplace Registration == 
     35 
     36We try to never overwrite the files of an installed package. In the case of inplace registration this is impossible because the overwriting has already happened. I feel that inplace registration should be discouraged. 
    1337 
    1438== Released and Unreleased packages == 
    1539 
    16 If we cabal install a package that is released on hackage we call this a clean install. Those should not be used to satisfy dependencies but rather to bring a package into scope in ghci to play with it. [Or to install an application? Why not phrase this positively? Also, why not first give the definitions completely, then discuss the differences.] If we cabal install an unreleased package we call this a dirty install. I assume that the source code for a released package is uniquely identified by its version number. [Why is this important?] For unreleased packages this is not the case. 
     40If we cabal install a package that is released on hackage we call this a clean install. If we cabal install an unreleased package we call this a dirty install. Clean installs are mainly used to bring a package into scope for ghci and to install applications. While they can be used to satisfy dependencies this is discouraged. For released packages the set of source files needed for compilation is known. For unreleased packages this is currently not the case. 
    1741 
    1842== The cabal hash == 
    1943 
    20 The idea is to identify installed packages by a hash of the information needed to build them. This hash is the new InstalledPackageId. [Probably not the hash alone, but also the package name (and also still the ABI hash?).] The new installation directory for each instance is $libdir/$pkgid/$installedpackageid. [Do we need to know the path at configure time, or build time, or only after build time?] The hash is computed during installation in GHC.installLib as well as during registration in Register.generateRegistrationInfo. [So it's computed by Cabal. At what stage? Where's the hash stored (if at all)?] 
     44Cabal needs to have a function to hash a build configuration. cabal-install uses the hash to check if a package needs to be built or if it is already installed. Cabal needs the hash during installation because the directory of an installed package contains the hash. It is $libdir/$pkgid/$installedpackageid. It also needs the hash during registration because a package is identified by it in the PackageDB. The hash is the new InstalledPackageId (Probably not the hash alone, but also the package name (and also still the ABI hash?)). 
    2145 
    22 The hash contains the following information: 
     46A build configuration consists of the following: 
    2347 
    24 The hashes of all the package instances that are actually used for compilation. This is called the environment. Those are available in the installedPkgs field of LocalBuildInfo. [When? Where?] 
     48The hashes of all the package instances that are actually used for compilation. This is the environment. It is available in the installedPkgs field of LocalBuildInfo which is available in every step after configuration. It can also be extracted from an InstallPlan after dependency resolution. 
    2549 
    2650The compiler, its version and its arguments and the tools and their version and their arguments. Available from LocalBuildInfo also. More specifically: compiler, withPrograms, withVanillaLib, withProfLib, withSharedLib, withDynExe, withProfExe, withOptimization, withGHCiLib, splitObjs, stripExes. And a lot more. [Like what?] 
    2751 
    28 The source code. This is necessary because if the source code changes the result of compilation changes. For released packages i would assume that the version number uniquely identifies the source code and only hash that but what about unreleased packages? [Again, why is it important? Because we don't want to download the tarballs for hash computation? But then, do we just use the version? Can a dirty install ever have the same hash as a clean install?] From the PackageDescription's library field the exposedModules can be extracted. Also from PackageDescription extraSrcFiles can be extracted. What about the Other Modules? We should also make sure that GHC used/uses only the files we ware hashing for compilation. 
     52The source code. This is necessary because if the source code changes the result of compilation changes. For released packages i would assume that the version number uniquely identifies the source code. A hash of the source code should be available from hackage to avoid downloading the source code. For an unreleased package we need to find all the source files that are needed for building it. Including non-haskell source files. One way is to ask a source tarball to be built as if the package was released and then hash all the sources included in that. 
    2953 
    30 Or we first ask a source tarball to be built as if the package was released and then this one is hashed. [Or? What's the difference? The compression?] 
     54Can a dirty install ever have the same hash as a clean install? No, because if it had it would have to use the same source code. But this source code is released so by definition it would be a clean install. Even if the source code was downloaded manually and cabal install was invoked in that directory. 
    3155 
    32 OS dependencies are not taken into account. [Why not?] 
    33  
    34 What is ComponentLocalBuildInfo for? 
     56OS dependencies are not taken into account because i think it would be very hard. 
    3557 
    3658== Other == 
    3759 
    38 The ABI hash becomes a field of InstalledPackageInfo. [You mean this is a change in behaviour? What about packages that don't have one?] 
    39  
    40 For inplace package registration any packages with the same location must be unregistered. For that you must ask for all installed packages, find the one that is installed to that location and unregister it. [I don't understand this.] 
     60The ABI hash becomes a field of InstalledPackageInfo. Some code in GHC needs to be adjusted to use this new field instead. [You mean this is a change in behaviour? What about packages that don't have one?] 
    4161 
    4262What about builtin packages like ghc-prim, base, rts and so on? 
     
    5777 
    5878Other Compilers, backwards compatibility? 
     79 
     80What is ComponentLocalBuildInfo for?