|Version 23 (modified by tibbe, 12 months ago) (diff)|
In a thread on glasglow-haskell-users in February some ideas about splitting base in smaller components were floating around. This wiki page tries to assemble ideas on how to re-group the modules.
This has been discussed before, e.g. in 2008.
Structural changes to the base package can be attempted towards the following goals:
(G1) To allow changes to internals without forcing a version-bump on ‘base’, on which every package depends
SPJ: But that goal needs a bit of unpacking. Suppose we divided base into six, base1, base2, base3, etc, but each was a vertical silo and every other package depended on all six. Then nothing would be gained; bumping any of them would cause a ripple of bumps down the line.
(G2) To allow packages to be explict about what they need
A library that does not use the IO monad could communicate that just by not depending on some base-io package. Similar with the Foreign Function Interface or unsafe operations.
(G3) To allow alternative implementations/targets
(G4) More appropriate string types in IO
We would like to be able to use the Text and ByteString types in the I/O layer. For example, we'd like to have:
module System.IO where read :: Handle -> Int -> IO ByteString write :: Handle -> ByteString -> IO ()
but since System.IO is defined in base it cannot depend on e.g. bytestring and thus we cannot write these functions. At the moment we have to use String for all I/O which is both slow, due to its cache-inefficient nature, and incorrect, as String is not a representation of a sequence of bytes (but rather a sequence of Unicode code points).
Splitting base would let us fix this and write a better I/O layer.
(G5) Avoid code copies
Johan says: The I/O manager currently has a copy of IntMap inside its implementation because base cannot use containers. Why? Becuase containers depends on base, so base can't depend on containers. Splitting base would let us get rid of this code duplication. For example:
- base-pure doesn't need containers
- containser depends on base-pure
- base-io depends on containers
(G6) Installable base
Right now, if a package depends on a specific version of base, there's no way to compile it with GHC that provides a different version of base.
After the split, hopefully, many subpackages of base will lose their «magic» status and become installable via cabal.
(G7) Split base into as FEW packages as possible, consistent with meeting the other goals
Other things being equal, we should split base into as few packages as necessary to meet other goals. Johan points out, a split now could paint us into a corner later, so we should not gratuitously split things up.
Large base, re-exporting API packages
Here we would keep one large base package, as now, with a number of wrapper packages that selectively expose stable sub-APIs.
Meets goals (G1), (G2), (G3)
- Cheap: little or no changes to the actual code in base
- Easier to define the APIs as desired, i.e. focused and stable, without worrying about implementation-imposed cycles
- No need to include internal modules in the API packages
- Alternative compilers/targets can provide these APIs with totally independent implementations
Actual base split
Here we genuinely split the code in base into sub-packages.
Meets goals (G4), (G5), (G6), I think (G3)
Could meet goals (G1), (G2), though shim packages might still be needed.
- Quite a bit of work
- Narrows implementation choices, because packages can't be mutually recursive. (i.e. forces IOError-less error)
- Hence further development may be easier (according to Ian)
- Some base-foo package can use other libraries like containers in their implementation (IntMap issue)
- More appropriate types like ByteString and Text can be used in, say, base-io-file
- Alternative compilers/targets may only have to reimplement some of the base-* packages.
- Possibly fewer modules in “magic” packages that cannot be installed via cabal.
This is a list of interdependencies between seemingly unrelated parts that need to be taken into consideration:
- class Monad mentions String, hence pulling Char
- class Monad mentions error and Data.Int requires throw DivideByZero, hence pulling in exceptions
- Exceptions pull in Typeable
- Typeable pulls in GHC.Fingerprint
- GHC.Fingerprint pulls in Foreign and IO (but could be replaced by a pure implementation)
- The Monad instance of IO calls failIO, which creates an IOException, which has fields for handles and devices, and hence pulls in some Foreign stuff and some file-related IO, preventing the creation of a clean base-io package. There exists a somewhat backwards compatible work-around.
- Some names of base are hardcoded in GHC and hence cannot be moved to a different package name without changes in GHC. This includes:
- The Num constraint on polymorphic literals. Can be avoided by writing fromIntegral 0 instead of 0.
- Similar, the [x..y] syntax generates a base:GHC.Enum.Enum constraint, RebindableSyntax does not help (GHC bug?)
- StablePtr, as used in GHC.Stable
- Typeable, Show when used in deriving. Can probably be avoided by hand-writing instances. Read can probably move completely out.
- error has its type wired in GHC when in package base; This is used in a hack in GHC/Err.hs-boot. Work-around: Import GHC.Types in GHC/Err.lhs-boot
- The Monad constraint on do-notation expects the definition to live in base. RebindableSyntax helps, but requires to define a local ifThenElse function.
- The ST Monad can (and should) be provided independently of IO, but currently functions like unsafeIOToST are provided in the Control.Monad.ST namespace.
Joachim has started a first attempt to pull stuff out of the bottom of base. See https://github.com/nomeata/packages-base/blob/base-split/README.md for an overview of progress and a description of changes. Use git clone git://github.com/nomeata/packages-base.git; git checkout base-split to experiment. This *does* try to split out as many packages as possible, just to see what is possible.