|Version 19 (modified by nomeata, 22 months ago) (diff)|
- Newtype wrappers
Email thread here.
Suppose we have
newtype Age = MkAge Int
Then if n :: Int, we can convert n to an Age thus: MkAge n :: Age. Moreover, this conversion is a type conversion only, and involves no runtime instructions whatsoever. This cost model -- that newtypes are free -- is important to Haskell programmers, and encourages them to use newtypes freely to express type distinctions without introducing runtime overhead.
Alas, the newtype cost model breaks down when we involve other data structures. Suppose we have these declarations
data T a = TLeaf a | TNode (Tree a) (Tree a) data S m a = SLeaf (m a) | SNode (S m a) (S m a)
and we have these variables in scope
x1 :: [Int] x2 :: Char -> Int x3 :: T Int x4 :: S IO Int
Can we convert these into the corresponding forms where the Int is replaced by Age? Alas, not easily, and certainly not without overhead.
- For x1 we can write map MkAge x1 :: [Age]. But this does not follow the newtype cost model: there will be runtime overhead from executing the map at runtime, and sharing will be lost too. Could GHC optimise the map somehow? This is hard; apart from anything else, how would GHC know that map was special? And it it gets worse.
- For x2 we'd have to eta-expand: (\y -> MkAge (x2 y)) :: Char -> Age. But this isn't good either, because eta exapansion isn't semantically valid (if x2 was bottom, seq could distinguish the two). See #7542 for a real life example.
- For x3, we'd have to map over T, thus mapT MkAge x3. But what if mapT didn't exist? We'd have to make it. And not all data types have maps. S is a harder one: you could only map over S-values if m was a functor. There's a lot of discussion about this on #2110.
In summary: The programmer expects zero-cost conversions between a newtype N and the type T it is based on. We want to allow the programmer to have zero-cost conversions between C N and C T. Requirements:
- It should be sound, i.e. have an implementation in Core with existing coercions, without new coercions or unsafeCoerce.
- It should respect module boundaries, i.e. not allow the user to create a function which he could not write using plain Haskell (and non-zero cost).
- (desirable:) It should be exportable and composable, i.e. the module of N should be able to specify that a conversion between N and T is allowed (or not), and the module of C should be able to specify if a coercion can be lifted (or not), and if both allow it, the user should be able to combine that even if the modules of N and T are independent.
- (desirable:) It should be possible to use such a coercion in one module in internal code, but not export it.
- (desirable:) It should be possible to add REWRITE rules that turn, for example, map N into the zero-cost conversion from C T to C N.
To clarify these requirements, here some benchmarks; feel free to expand if you have further needs.
- Allow the user to define a function of type C N -> C T with a zero-cost model.
- Allow the author of Data.Set prevent a coercion Set N -> Set T (as it would break internal invariants).
- Allow the author of a escaping-ensuring newtype HTML = HTML String to use the coercion [String] -> [HTML] internally, but prevent the user of the library from doing that.
Core already had provided all the necessary feature; the question was just how to offer it on the Haskell level. The implementation comes in form of a coerce :: Coercible a b -> a -> b and a type class Coercible that relates two types if they have the same representation, i.e. can be related by a coercion of role Representational (see Roles). See the haddock documentation for coercible for user-level documentation and Note [Coercible Instances] for information on the implementation.
The implementation fulfills the first goal, the second partly (C N -> C T is allowed even without C's data constructors in scope; if C should be abstract the role of its argument needs to be Nominal). Due to the ad-hoc nature of the Coercible instances, the second and third goal are not achieve. No work towards the fifths goal has been done.
Discussion of alternatives
Top level declarations of coercions as functions (e.g. newtype wrap w1 :: [Int] -> [Age] and newtype wrap w2 :: (Char -> Int) -> (Char -> Age)).
This is close to the actual implementation, but requires giving names to the coercion and introduces new syntax.
(This is basically the current implementation, advocated by Roman Cheplyaka.)
Given a module providing
data NT a b -- abstract coerce :: NT a b -> a -> b refl :: NT a a sym :: NT a b -> NT b a trans :: NT a b -> NT b c -> NT a c
and additional code to derive stuff like
deriving nt :: NT N T deriving listNT :: NT a b -> NT [a] [b]
One problem with this approach is that if the user can use arbitrary Haskell to mange the NT values, he can create bottoms. Also, additional syntax is required. It was argued that the benefits over the type-class apporoach (Approach 2) do not warrant the extra syntactical complexity.