|Version 8 (modified by simonpj, 8 months ago) (diff)|
Email thread here.
Suppose we have
newtype Age = MkAge Int
Then if n :: Int, we can convert n to an Age thus: MkAge n :: Age. Moreover, this conversion is a type conversion only, and involves no runtime instructions whatsoever. This cost model -- that newtypes are free -- is important to Haskell programmers, and encourages them to use newtypes freely to express type distinctions without introducing runtime overhead.
Alas, the newtype cost model breaks down when we involve other data structures. Suppose we have these declarations
data T a = TLeaf a | TNode (Tree a) (Tree a) data S m a = SLeaf (m a) | SNode (S m a) (S m a)
and we have these variables in scope
x1 :: [Int] x2 :: Char -> Int x3 :: T Int x4 :: S IO Int
Can we convert these into the corresponding forms where the Int is replaced by Age? Alas, not easily, and certainly not without overhead.
- For x1 we can write map MkAge x1 :: [Age]. But this does not follow the newtype cost model: there will be runtime overhead from executing the map at runtime, and sharing will be lost too. Could GHC optimise the map somehow? This is hard; apart from anything else, how would GHC know that map was special? And it it gets worse.
- For x2 we'd have to eta-expand: (\y -> MkAge (x2 y)) :: Char -> Age. But this isn't good either, because eta exapansion isn't semantically valid (if x2 was bottom, seq could distinguish the two). See #7542 for a real life example.
- For x3, we'd have to map over T, thus mapT MkAge x3. But what if mapT didn't exist? We'd have to make it. And not all data types have maps. S is a harder one: you could only map over S-values if m was a functor. There's a lot of discussion about this on #2110.
In summary: The programmer expects zero-cost conversions between a newtypes N and the type T it is based on. We want to allow the programmer to have zero-cost conversions between C N and C T. Requirements:
- It should be sound, i.e. have an implementation in Core with existing coercions, without new coercions or unsafeCoerce.
- It should respect module boundaries, i.e. not allow the user to create a function which he could not write using plain Haskell (and non-zero cost).
- (desirable:) It should be exportable and composable, i.e. the module of N should be able to specify that a conversion between N and T is allowed (or not), and the module of C should be able to specify if a coercion can be lifted (or not), and if both allow it, the user should be able to combine that even if the modules of N and T are independent.
- (desirable:) It should be possible to use such a coercion in one module in internal code, but not export it.
- (desirable:) It should be possible to add REWRITE rules that turn, for example map N into the zero-cost conversion from C T to C N.
To clarify these requirements, here some benchmarks; feel free to expand if you have further needs.
- Allow the user to define a function of type C N -> C T with a zero-cost model.
- Allow the author of Data.Set prevent a coercion Set N -> Set T (as it would break internal invariants).
- Allow the author of a escaping-ensuring newtype HTML = HTML String to use the coercion [String] -> [HTML] internally, but prevent the user of the library from doing that.
Clearly what we want is a way to "lift" newtype constructors (and dually deconstructors) over arbitrary types, so that whenever we have some type blah Int blah we can convert it to the type blah Age blah, and vice versa.
Tantalisingly, System FC, GHC's internal Core language, already has exactly this!
- A newtype constructor turns into an FC cast:
MkAge x turns into x |> AgeNTCo where AgeNTCo :: Int ~ Age
The |> is a cast, and the AgeNTCo is a coercion axiom witnessng the equality of Int and Age.
- Coercions can be lifted, so that
[AgeNTCo] :: [Int] -> [Age] Char -> AgeNTCo :: (Char -> Int) ~ (Char -> Age) T AgeNTCo :: T Int ~ T Age S IO AgeNTCo :: S IO Int ~ S IO Age
So all we need is concrete syntax to allow you to ask for these lifed coercions in Haskell.
The first possiblity involves a new top-level declaration:
newtype wrap w1 :: [Int] -> [Age]) newtype wrap w2 :: (Char -> Int) -> (Char -> Age) newtype wrap w3 :: T Int -> T Age ..etc..
newtype unwrap u1 :: [Age] -> [Int]) newtype unwrap u2 :: (Char -> Age) -> (Char -> Int) ..etc...
This brings into scope the variables w1, w2, etc, with the declared types. Applications of these wrappers and unwrappers have the same cost model as newtype constructors themselves: they are free.
- The type specified in a newtype wrap/unwrap declaration must be of the form type1 -> type2.
- wrap and unwrap are keywords, but only when they occur right after the keyword newtype.
- Wrappers/unwrappers can be polymorphic
newtype wrap foo :: [(Int,b)] -> [(Age,b)]
- Multiple "layers" of newtype can be wrapped at once (just as in foreign declarations). For example:
newtype Fun a = MkFun (a -> a) newtype Age = MkAge Int newtype wrap foo :: (Int -> Int) -> Fun Age
The second possibility is superficially simpler: just provided a new built-in constant with type
newtypeCast :: NTC a b => a -> b
Here NTC is a built-in type class that witnesses the (free) conversion between a and b. Although it would not quite be implemented like this, we would have a built-in instance for each data type (but see Type Soundness below):
instance NTC a b => NTC [a] [b]
and two built-in instances for each newtype:
instance NTC Int b => NTC Age b instance NTC a Int => NTC a Age
So to solve a NTC constraint you unwrap all those newtypes (being careful about abstraction; see next section).
This plan requires a bit more paddling under the water on GHC's part, especially during type inference, but it looks a lot more straightforward than I first thought. Thanks to Roman Cheplyka for advocating this solution.
One issue with type classes is that they cannot be used only internally, so the third use case above would not be possible.
Suppose we have
module Map( ... ) where data Map a b = ...blah blah... module Age( ... ) where newtype Age = MkAge Int
Now suppose we want a newtype wrapper like this
import Map import Age newtype wrap foo :: Map Int Bool -> Map Age Bool
Could we write foo by hand? (This is a good criterion, I think.) Only if we could see the data constructors of both Map and Age.
- If we can't see the data constructor of Age we might miss an invariant that Ages are supposed to have. For example, they might be guaranteed positive.
- If we can't see the data constructors of Map, we might miss an invariant of Maps. For example, maybe Map is represented as a list of pairs, ordered by the keys. Then, if Age orders in the reverse way to Int, it would obviously be bad to substitute.
Invariants like these are difficult to encode in the type system, so we use "exporting the constructors" as a proxy for "I trust the importer to maintain invariants". The "Internals" module name convention is a signal that you must be particularly careful when importing this module; runtime errors may result if you screw up.
One possible conclusion: if we have them at all, newtype wrappers should only work if you can see the constructors of both the newtype, and the type you are lifting over.
But that's not very satisfactory either.
- There are some times (like IO) where it does make perfect sense to lift newtypes, but where we really don't want to expose the representation.
- Actually Map is also a good example: while Map Age Bool should not be converted to Map Int Bool, it'd be perfectly fine to convert Map Int Age to Map Int Int.
- The criterion must be recursive. For example if we had
data Map a b = MkMap (InternalMap a b)It's no good just being able to see the data constructor MkMap; you need to see the constructors of InternalMap too.
The right thing is probably to use kinds, and all this is tantalisingly close to the system suggested in Generative type abstraction and type-level computation. Maybe we should be able to declare Map to be indexed (rather than parametric) in its first parameter.
More thought required.
This is a write-down of a discussion between me (nomeata) and SPJ at TLCA 2013, but my memory might have tricked me and I might have filled in some bits myself, so beware.
Assume there is a module provided by GHC with these definitions
data NT a b -- abstract coerce :: NT a b -> a -> b refl :: NT a a sym :: NT a b -> NT b a trans :: NT a b -> NT b c -> NT a c
and the intention that NT a b is a witness that a and b have the same representation and that coerce n has zero runtime cost.
Besides the trivial refl values, values of type NT cannot be created by the user, but only by a special deriving statement giving the type of a function that creates NT values, but without a definition, e.g.
deriving nt :: NT N T deriving list :: NT a b -> NT [a] [b]
The compiler either rejects the declaration or creates an implementation automatically. It should only create an implementation if a non-zero-cost-implementation could be written by the user instead. (I expect that this means that the constructors of the return type are in scope.)
This solves the abstraction problem for Data.Map: The library author only exports NT a b -> NT (Map k a) (Map k b), but not NT a b -> NT (Map a v) (Map b v)`.
All this amounts to exposing explicit System-FC coercions to the programmer; and that is precisely what we want to do.
- Explicit, because Age and Int really are different types, and so we can't silently convert between them.
- Explicit, so that we can control precisely which coercions are available (via export lists), and thus control abstraction.
Of course with an NT data type, it is possible to define this type class, e.g.
class NT' a b where theNT :: NT a b coerce' :: NT' a b => a -> b coerce' = coerce theNT
and use it where more convenient than the function-based variant.
As to the implementation, I quite naively expect that the definition of NT and related function in terms of Core to be straight-forward (newtype NT a b = a ~_R b). The code that does the deriving is maybe non-trivial, as it has to build the term from available newtype axioms (where the constructor is in scope), coercions given as parameters and the application of type constructors to coercions (again, where the data constructors are in scope).
type family F a type instance F Int = Int type instance F Age = Char data T a = MkT (F a) newtype wrap bad :: T Int -> T Age bogus :: Int -> Char bogus n = case (bad (MkT n)) of MkT c -> c
The problem is, as usual, the type function hiding inside T's definition. The solution is described in Generative type abstraction and type-level computation. It is still not implemented, alas, but adding the newtype wrappers introduces no problems that we do not already have.