Opened 19 months ago

Last modified 3 months ago

#8244 new bug

Removing the Cabal dependency

Reported by: nh2 Owned by: duncan
Priority: normal Milestone:
Component: Compiler Version: 7.6.3
Keywords: Cc: mail@…, difrumin@…, ydewit, jp@…, bgamari@…, mail@…, juhp@…, snoyberg, bardur.arantsson, gideon@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Revisions: D172

Description

GHC depends on cabal, which is so far has been problematic many times, for many reasons.

A few discussions include:

GHC uses only a very small part of Cabal, in these files:

./compiler/ghci/Linker.lhs
./compiler/main/Packages.lhs
./compiler/main/PackageConfig.hs
./compiler/main/Finder.lhs

plus 1 file for ghc-pkg: ./utils/ghc-pkg/Main.hs (see http://www.haskell.org/pipermail/haskell-cafe/2013-September/108750.html for details).

It was proposed that either

  • the package format could be a plain specification without direct code dependencies
  • the Cabal package could be split off into Cabal-the-build-system and a minimal part to describe the package DB to be shared by Cabal and GHC

The Cabal part that is used is in only a few modules of Distribution.* while the remaining majority of the Cabal-the-library package is not used (e.g. none of Distribution.Simple.*).

Decoupling GHC and Cabal seems to be a public desire, yet there are some problems with these approaches. Let us discuss them in this ticket.

Attachments (2)

ghc-2.png (104.9 KB) - added by ydewit 19 months ago.
GHC x Cabal main dependencies
GHCPackages.png (14.6 KB) - added by ydewit 19 months ago.

Download all attachments as: .zip

Change History (38)

comment:1 Changed 19 months ago by nh2

A follow-up discussion from #ghc:

dcoutts_: nh2: Cabal does not depend on the ghc-pkg format. Cabal specifies a compiler-independent package registration format. GHC uses it in its external interface (and internally too). It uses the Cabal lib for the parser+printer because it's easier than making its own and keeping up with spec changes..
dcoutts_: type+parser+printer
nh2: dcoutts_: would it still not be easier to make this package database specification a separate thing that both ghc and cabal can depend on? It seems to me that this would be much less a moving target than Cabal-the-build-system is
dcoutts_: nh2: what does make sense is to split the Cabal lib into the Distribution.* bits and the Distribution.Simple.* bits
dcoutts_: nh2: it's not a natural split
hvr: nh2: btw, a related thread: http://www.haskell.org/pipermail/ghc-devs/2013-March/000800.html
dcoutts_: nh2: there's a lot of types shared between the .cabal format and the InstalledPackageInfo type
dcoutts_: as well as parser + printer infrastructure
dcoutts_: nh2: it makes sense to keep that all together, that's the Distribution.* stuff
dcoutts_: as I said, what does make sense to split (it's been deliberately kept mostly-separate) is the Distribution.Simple.* part
dcoutts_: nh2: and we need a parser for that part, that's the dependency that's annoying
thoughtpolice: so yes, i'm going to look into it today if at all possible
nh2: dcoutts_: that makes sense. ghc does not depend on Distribution.PackageDescription either, right?
dcoutts_: nh2: right, it doesn't need the source package type (PackageDescription), just the installed package type (InstalledPackageInfo)
dcoutts_: nh2: but splitting these into different packages would not buy us much and it's not a natural split
nh2: leaving away Distribution.Simple.*, the remaining part is already so small that it indeed looks like a small enough interface
dcoutts_: nh2: it'd only help JP M if the remaining part (lets call it cabal-build-simple) could build with an earlier core part (lets call it cabal-lib) (in his request in http://www.haskell.org/pipermail/haskell-cafe/2013-September/108746.html)
dcoutts_: nh2: and doesn't help me with my parser problems, we still cannot depend on a decent parser combinator lib
dcoutts_: still have to use the crappy ReadP
nh2: dcoutts_: Distribution.PackageDescription is the .cabal file format itself, right? Not sure if that should be part of the package DB spec, it changes more often and ghc can't make use of it
nh2: why is it that you cannot depend on something better?
dcoutts_: nh2: because ghc cannot depend on parsec easily
dcoutts_: because it pulls in too many other things
dcoutts_: the ghc devs objected to my suggestion
dcoutts_: nh2: that's true but what does it really buy us if they're in separate packages? We still cannot guarantee to support JP M's request
dcoutts_: e.g. in the switch to 1.18, there have been enough changes that we'd need the latest version of the InstalledPackageInfo
hvr: dcoutts_: ...seems you have to explain that again everytime somebody brings it up =)
nh2: dcoutts_: but do I not understand it right that if you put PackageDescription not into cabal-lib and only in Cabal, Cabal could actually depend on a proper parser since GHC doesn't depend on it any more?
dcoutts_: nh2: it's not a monolithic parser
dcoutts_: nh2: we have that Text class
dcoutts_: with the combinator parsers for all the various types used in .cabal and installed package files
dcoutts_: these types + parser/printer infrastructure are shared between the source and installed package files
dcoutts_: so even if we split it, we still have the problem of needing a parser lib
lemao: dcoutts_: I hear you wrt to the difficulties and mixed results of splitting Distribution.Simple at the same time that this GHC dependency on cabal is really problematic for all the reasons already discussed
dcoutts_: lemao: I don't think splitting it would fix that
lemao: dcoutts_: yes, I hear you. Maybe the right solution here is to have GHC own their own internal package info impl so Cabal and GHC can go their separate ways
dcoutts_: you'd still have ghc depending on this smaller part, and Cabal/cabal-install would still depend on (usually) the latest version of that
dcoutts_: lemao: but that's also not satisfactory (for cabal-lib to be a private dep of ghc) because ghc api exposes the InstalledPackageInfo type
dcoutts_: it's not a private dependency of the ghc api package, it's a public dependency
lemao: dcoutts_: I guess what I meant is that ghc-pkg package format/parser/etc would be a complete fork
dcoutts_: which then means you cannot pass the InstalledPackageInfo from ghc api functions to anything else
lemao: dcoutts_: at the same time that there are issues with the split there are real issues witht he current status quo
dcoutts_: as well as meaning it'd get out of sync
nh2: dcoutts_: InstalledPackageInfo looks like a very simple/straightforward type though
dcoutts_: nh2: on it's own, but it uses a bunch of other types + their parsers+printers
dcoutts_: nh2: and are we really saying that we could always work with old versions of this type, that we'd never need to depend on the latest version in the latest version of Cabal?
dcoutts_: because if not, then we gain nothing
lemao: dcoutts_, nh2: real question here, how often does the package info that matters for ghc actually changed in the past?
dcoutts_: lemao: it does change occasionally
dcoutts_: and it will change again
dcoutts_: we have changes pending
lemao: dcoutts_, nh2: I can see how most of the drivers for these changes come from cabal
nh2: dcoutts_: I can't see many other types, there are only two: License (a simple enum) and Version. Everything else is String/Bool
dcoutts_: nh2: PackageName, PackageId etc
nh2: dcoutts_: are both string newtypes
dcoutts_: nh2: but note also that it uses the same parser infrastructure

comment:2 Changed 19 months ago by DaniilFrumin

  • Cc difrumin@… added

comment:3 follow-up: Changed 19 months ago by simonpj

I would love to remove this dependency. Having it implies that GHC depends on heavy-duty Cabal functionality, but of course it doesn't at all. It means that we have to compile all 60+ modules of Cabal before even starting on GHC. It seems wrong.

So, more power to you! I have no opinions about the details -- just wanting to be encouraging.

Simon

comment:4 Changed 19 months ago by ydewit

  • Cc ydewit added

Changed 19 months ago by ydewit

GHC x Cabal main dependencies

comment:5 Changed 19 months ago by JeanPhilippeMoresmau

  • Cc jp@… added

comment:6 Changed 19 months ago by ydewit

This most likely asks for a minimal lib shared between GHC and Cabal and here is, based on previous emails/discussion, a general set of requirements:

  1. GHC should not depend on a specific Cabal version and shouldn't need to include Cabal in it's build infrastructure (Cabal should just be a pre-requisite tool installed in the system with a general version range for compatibility e.g. Cabal 1.10+)
  2. GHC should not be forced to accept specific dependencies introduced by Cabal (e.g. new InstalledPackageInfo parsers/etc) - this means that a shared lib for GHC and Cabal should be minimal.
  3. Cabal should not be constrained by the limited set of dependencies allowed in GHC (e.g. free to introduce whatever new parsers makes sense)
  4. Cabal should be able to add new InstalledPackageInfo fields that have no meaning to GHC without affecting GHC - i.e. maybe there should be a generic custom field that is opaque to GHC but that is still stored with GHC's package repo.

In addition, I would also like to add other general comments/questions for discussion here:

  • where do we think this shared lib belongs to, or iow, who onws it, ghc or cabal?
  • Is the long term goal to have GHC as a compiler that only knows how to compile single packages (so dumb wrt to package resolution? Is that even possible?) and where the current ghc-pkg functionality is really all managed by Cabal?
  • Or should GHC (or any other Haskell compiler for that matter) have their own notion of a package, dependencies to support linking, repl, shared libs)?

And finally, I would like to introduce a potential solution to this problem that is nothing really new considering what has already been discussed in the past, but describes it in a larger scope.

The idea is to view this shared lib as an API package containing only interface types/functions and NO implementation. At first this API package would contain only one '...Packages' module with the shared types/functions between GHC and Cabal, but it could in the future contain additional modules that could make sense (parsing, name resolution, command line front-end, etc). With a bit of discipline, this API package could also be used by the Haskell-Suite project (e.g. the Haskell-Packages could be another implementation of the same packages API, or other haskell existing compilers). Cabal would then have a single, abstract way of interfacing with Haskell compilers for package management and adding new ones would not require major Cabal changes.

Changed 19 months ago by ydewit

comment:7 Changed 19 months ago by JeanPhilippeMoresmau

I tried to see what could work. This is what I've done so far:

  • created a new distribution-base library (the name avoids any reference to Cabal, don't know if it's wise)
  • build that library instead of Cabal in stage0 and stage1
  • reference that library instead of Cabal in ghc.cabal, pkg-pkg.cabal, Cabal.Cabal
  • add the dependency when building ghc-cabal and ghc-tags, both need the full Cabal anyway.

The distribution-base library contains the following modules:

  • Distribution.Compat.ParseUtils: the bits of Cabal's Distribution.ParseUtils we need
  • Distribution.Compat.ReadP: Cabal's ReadP
  • Distribution.InstalledPackageInfo: Cabal's Distribution.InstalledPackageInfo
  • Distribution.License: Cabal's Distribution.License
  • Distribution.ModuleName: Cabal's Distribution.ModuleName
  • Distribution.Package: Cabal's Distribution.Package minus the Dependency type and related functions (unused in GHC)
  • Distribution.PackageIndex: the bits of Cabal's Distribution.Simple.PackageIndex we need
  • Distribution.Text: Cabal's Distribution.Text
  • Distribution.Utils: the bits of Cabal's Distribution.Simple.Utils we need

Note that Distribution.Version is not needed by GHC, even though it used to import it, it only used the datatype from base, and the Data instance I've moved to Distribution.Package.
I've created a new Cabal module called Distribution.Dependency since the Dependency type is not needed, in the Cabal source code it's mainly imports that need to change.

comment:8 Changed 19 months ago by JeanPhilippeMoresmau

My code can be found at https://github.com/JPMoresmau/ghc and https://github.com/JPMoresmau/Cabal (branch distribution-base), if anybody wants to see the dependencies.

comment:9 in reply to: ↑ 3 Changed 19 months ago by simonmar

Replying to simonpj:

I would love to remove this dependency. Having it implies that GHC depends on heavy-duty Cabal functionality, but of course it doesn't at all. It means that we have to compile all 60+ modules of Cabal before even starting on GHC. It seems wrong.

So, more power to you! I have no opinions about the details -- just wanting to be encouraging.

Removing the Cabal dep is good of course, but I just wanted to reply to the above - the GHC build system will still depend on Cabal, because we really do use Cabal to help build the libraries that come with GHC (not to do the actual building, but to understand the .cabal files). So those 60+ modules that we have to build right off the bat will stick around, and we'll still need a full Cabal in the source tree.

Perhaps it's possible to extract another subset of Cabal that we use in the build system, but that's clearly less important than extracting the bit that GHC itself depends on.

comment:10 Changed 19 months ago by ydewit

Yes, having the GHC build system depend on Cabal is harmless: the real issue, imo, is GHC modules directly depending on Cabal modules to handle package description/parsing/serialization functionality.

I also agree that it doesn't make sense to extract a subset of Cabal just to be used by the build system: that wouldn't add anything and possibly even make it harder to build GHC.

However, I do not follow why would the 60+ Cabal modules and the full Cabal source tree would still need to stick around. If only the GHC build system depends on Cabal for building GHC libraries, why can't Cabal be just a pre-requisite to building GHC in the same way that a recent version of GHC is a pre-requisite to building stage0?

The way I am seeing this, once this direct dependency from GHC to Cabal is removed, GHC build system can simply state as a pre-requisite a range of Cabal versions supported. And If we do a good job extracting a minimal set of modules for this shared package that has no Cabal or GHC specific implementation details, then this shared package will rarely change and this range of supported Cabal versions will be quite wide.

Version 0, edited 19 months ago by ydewit (next)

comment:11 Changed 19 months ago by simonmar

We have Cabal in the source tree for two reasons:

(1) So that we don't have to depend on the user having the correct version of Cabal installed. We often make changes to GHC and Cabal in tandem, and we can do that without having to release a new Cabal and have everyone upgrade to it every time we need to make a change.

(2) We ship Cabal with GHC, so it needs to be in the tree anyway.

comment:12 Changed 13 months ago by bgamari

  • Cc bgamari@… added

comment:13 Changed 11 months ago by nomeata

  • Cc mail@… added

comment:14 Changed 11 months ago by juhpetersen

  • Cc juhp@… added

comment:15 Changed 9 months ago by snoyberg

  • Cc snoyberg added

comment:16 Changed 9 months ago by duncan

So I've started working on this. The design I'm following will mean that ghc the library does not depend on Cabal, but ghc-pkg will continue to depend on Cabal and so Cabal will still be built and shipped with ghc as it is now. (It also doesn't involve any unnatural splits in the Cabal lib).

If I can get away with it, I'll also remove the support for single-file style package dbs, and just use the (now standard) package.conf.d style dbs. Any objections?

comment:17 Changed 9 months ago by simonmar

No objection from me to removing the old package DB format, though you might find that you need to update some tests.

comment:18 Changed 9 months ago by duncan

Oh, and the other thing my design relies on is the "cache" always being up to date. GHC will only read the binary cache file and never read any of the .conf files. (I've not checked if this was already the case or not.)

comment:19 Changed 9 months ago by simonmar

When using the new DB format, GHC only reads the cache and not the individual .conf files, so you're ok on that front.

comment:20 Changed 9 months ago by bardur.arantsson

  • Cc bardur.arantsson added

comment:22 Changed 7 months ago by ezyang

  • Differential Revisions set to D172
  • Owner set to dcoutts

comment:23 Changed 7 months ago by duncan

comment:24 Changed 7 months ago by refold

Very cool! Since these patches don't completely remove the dependency on Cabal from GHC, will it be at least possible to allow upgrading to new major Cabal versions in GHC point releases? It was annoying that the last few Haskell Platform releases had to ship with an old version of cabal-install because the Cabal version was fixed by GHC.

comment:25 follow-up: Changed 7 months ago by ezyang

  • Owner changed from dcoutts to duncan

refold: It's a good question, and thinking about this question more carefully, no, this patchset alone doesn't give us the capability. The problem is that GHC is still tightly coupled to ghc-pkg, but ghc-pkg still has a Cabal dependency and thus if you update Cabal, you also need to upgrade ghc-pkg. So, the only way to make Cabal separately upgradeable is by siphoning ghc-pkg off into a proper package, relaxing the tight coupling and upgrading it when you upgrade Cabal. duncan, can we do this?

comment:26 follow-up: Changed 7 months ago by ezyang

duncan: I also realized I had another major design question about the new binary package format. In your design doc, you state that the reason we need to store Cabal's information in the binary package database is because ghc-pkg needs to be able to regurgitate the information later. However, isn't the textual files in the database intended to be the "primary" representation, in which case can't ghc-pkg just hit the actual filesystem rather than using the binary package database?

Normally, I'd be indifferent, but if we can reduce the size of the binary package database that will improve GHC startup times. And it's not like we need to make sure ghc-pkg's 'describe' functionality is blazingly fast...

comment:27 Changed 7 months ago by refold

ezyang: What I had in mind was upgrading, say, Cabal 1.22 -> 1.24 when going from GHC 7.10.2 to GHC 7.10.3. Since GHC API will no longer depend on Cabal, this should be less of a problem.

So, the only way to make Cabal separately upgradeable is by siphoning ghc-pkg off into a proper package, relaxing the tight coupling and upgrading it when you upgrade Cabal.

Is splitting the parts of Cabal ghc-pkg uses into a separate library out of the picture?

comment:28 Changed 7 months ago by ezyang

What I had in mind was upgrading, say, Cabal 1.22 -> 1.24 when going from GHC 7.10.2 to GHC 7.10.3. Since GHC API will no longer depend on Cabal, this should be less of a problem.

Well, it would still necessitate bumping up GHC's internal copy of Cabal, since ghc-pkg needs to be compiled with the right version.

Is splitting the parts of Cabal ghc-pkg uses into a separate library out of the picture?

Well, if those bits don't change, you can probably use ghc-pkg with a new Cabal and it probably will work. The whole point is if those types change then you have to clue in ghc-pkg.

comment:29 in reply to: ↑ 26 ; follow-up: Changed 7 months ago by duncan

ezyang and I discussed this on IRC, but for the record...

Replying to ezyang:

duncan: I also realized I had another major design question about the new binary package format. In your design doc, you state that the reason we need to store Cabal's information in the binary package database is because ghc-pkg needs to be able to regurgitate the information later. However, isn't the textual files in the database intended to be the "primary" representation, in which case can't ghc-pkg just hit the actual filesystem rather than using the binary package database?

It could read the text files, but this would be slower than using the binary cache. The performance of ghc-pkg dump is actually important. It's used by cabal to get the installed packages.

Normally, I'd be indifferent, but if we can reduce the size of the binary package database that will improve GHC startup times. And it's not like we need to make sure ghc-pkg's 'describe' functionality is blazingly fast...

The binary file is structured so that the part that ghc reads comes first. So the extra data for ghc-pkg to read back will not affect the time taken to read the part for ghc.

ghc-pkg describe does not need to be fast, but ghc-pkg dump does (at least reasonably so).

comment:30 in reply to: ↑ 25 Changed 7 months ago by duncan

Replying to ezyang:

refold: It's a good question, and thinking about this question more carefully, no, this patchset alone doesn't give us the capability. The problem is that GHC is still tightly coupled to ghc-pkg, but ghc-pkg still has a Cabal dependency and thus if you update Cabal, you also need to upgrade ghc-pkg. So, the only way to make Cabal separately upgradeable is by siphoning ghc-pkg off into a proper package, relaxing the tight coupling and upgrading it when you upgrade Cabal. duncan, can we do this?

I don't think this is true. If you upgrade Cabal you do not need to upgrade ghc-pkg. Rememer that Cabal can work with older (and often newer) versions of ghc. It is in fact not tightly coupled with ghc-pkg, because the coupling is only via the external textual representation of the InstalledPackageInfo which gives us a lot of room for forwards and backwards compatability.

The only times when they're more strongly coupled is when ghc-pkg requires new fields in the InstalledPackageInfo. In that case you need to be using a newer Cabal.

So as far as I can see, upgrading Cabal will still be fine under this new scheme, with the bonus that the ghc library itself will not use it. GHC will still ship with Cabal, but you could add a new version.

Could you take an existing ghc binary tarball and modifiy it to include a newer Cabal lib without breaking things? Probably yes. No other libraries that ghc ships will depend on Cabal, so they would not break. And it would be fine for ghc-pkg to have been built against the older Cabal, so long as it is statically linked against Cabal (or the older Cabal .so is still included).

comment:31 in reply to: ↑ 29 ; follow-up: Changed 7 months ago by refold

Replying to duncan:

It could read the text files, but this would be slower than using the binary cache. The performance of ghc-pkg dump is actually important. It's used by cabal to get the installed packages.

Maybe Cabal should have its own binary cache for the data it gets out of ghc-pkg dump? We can check whether compiler's package DB is older than our cache and only use the slow path (ghc-pkg dump) when it's not. Perhaps we could also share the Binary instance for InstalledPackageInfo between Cabal and ghc-pkg.

comment:32 Changed 7 months ago by duncan

@simonpj, @ezyang thanks for all the updates to the CabalDependency wiki page.

comment:33 in reply to: ↑ 31 Changed 7 months ago by duncan

Replying to refold:

Maybe Cabal should have its own binary cache for the data it gets out of ghc-pkg dump? We can check whether compiler's package DB is older than our cache and only use the slow path (ghc-pkg dump) when it's not. Perhaps we could also share the Binary instance for InstalledPackageInfo between Cabal and ghc-pkg.

I don't think cabal calling ghc-pkg dump is currently a performance bottleneck. If it becomes one, there's several things we can do to improve it before adding caching.

In a sense we already have a cache, when we cabal configure we read it only once and don't re-read it for cabal build. When installing a bunch of packages, it's read once for dep planning, and again for each package installed.

comment:34 Changed 7 months ago by simonpj

I hope that the CabalDependency page captures everything so far in this ticket. If not, yell.

comment:35 Changed 7 months ago by simonmar

Is there a reason not to use two separate files for the two formats other than atomicity of updates?

comment:36 Changed 3 months ago by gidyn

  • Cc gideon@… added
Note: See TracTickets for help on using tickets.