Newtypes and type families combine to produce inconsistent FC(X) axiom sets

changed weight to 5

changed title from System FC{Newtypes,TypeFamilies} is unsound to Newtypes and type families combine to produce inconsistent FC(X) axiom sets

changed the description

Trac metadata

Trac field	Value
Component	Compiler → Compiler (Type checker)

added typechecker label

An unsound typesystem is a very bad thing.

Trac metadata

Trac field	Value
CC	→ samb

There are really two issues here mixed up. One one hand we have an infelicity in combining the FC equalities introduced by newtypes and type families. That's a bad thing from a theoretical point of view, but I don't know of an exploit to break type safety with that. In fact, I think it is merely a problem of the type-preserving translation to FC in GHC and not with the source type system (ie, all accepted programs are fine, even if they lead to fishy FC).

On the other hand, the example code demonstrates that newtype deriving in its current generality is unsound and leads GHC to accept source programs that it should not accept. Here is an alternative example that exploits the same problem using GADTs instead of type families (its not quite as dramatic as it doesn't segv, but it refutes an irrefutable pattern it really shouldn't):

{-# LANGUAGE GeneralizedNewtypeDeriving #-}

class IsInt t where
    isInt :: c Int -> c t
instance IsInt Int where isInt = id

newtype Moo = Moo Int deriving(IsInt,Show)

data Z a where
  ZI :: Double -> Z Int
  ZM :: (Int, Int) -> Z Moo

foo :: Z Moo -> Moo
foo ~(ZM (i, _)) = Moo i

main = print $ foo (isInt (ZI 4.0))

As I see it, the fundamental issue here is that the treatment of newtypes is inconsistent between Haskell and Core. In Haskell, newtypes are just cheap data, and they can only be added or removed at the top level of a type. In FC(X), newtypes are rather more like type level lambdas, and can be added or removed anywhere in a type expression as long as the points where β-conversion occurs are clearly marked with cast.

The introduction of generalized newtype deriving complicated this. Because you can use the newtype function on an arbitrary dictionary, you can construct any value, including Leibniz equality witnesses, thus allowing FC(X)'s newtype semantics to escape into the Haskell world.

Many other language features, such as GADTs, associated types, and (I strongly suspect, but have not been able to prove due to the difficulty inherent in comprehending them) functional dependencies, were implemented so as to deeply assume Haskell-98 newtype semantics, and break when exposed to FC(X) lambda semantics.

So, there are two options.

Make Haskell look like FC(X)

We would need to treat newtypes as fully transparent for the purposes of overlap checks in associated-type and fundep checking, and for GADT overlap/inaccessibility checking.

Make FC(X) look like Haskell

We could add a new kind constructor NCO. (Using the concrete syntax of the Calculus of Constructions, with *{0} the sort of values and *{1} the sort of types, eg Bool : *{0} : *{1}):

NCO : [kind : *{1}] [lty : kind] [rty : kind] *{0}

Then, we can generalize TRANS and SYM to work on NCO: (in practice we'd use ad-hoc polymorphism)

TRANS_n : [kind : *{1}] [t1 : kind] [t2 : kind] [t3 : kind] [lco : NCO kind t1 t2] [rco : NCO kind t2 t3] NCO kind t1 t3 SYM_n : [kind : *{1}] [t1 : kind] [t2 : kind] [co : NCO kind t1 t2] NCO kind t2 t1

A subtyping relation exists:

SUB : [kind : *{1}] [t1 : kind] [t2 : kind] [co : CO kind t1 t2] NCO kind t1 t2

and REFL, CAST use NCO; while (this is very important) COMP, LEFT, and RIGHT continue to use full CO's. Thus a NCO represents a coercion that is only valid at the top level because it cannot be composed, and this restriction is enforced by Core Lint. (Perhaps it would be good to make other kinds of type function use NCO?)

For completenes sake, please not that this bug is related to #1251 (closed), as was pointed out on #haskell.

Trac metadata

Trac field	Value
CC	chak, ganesh.sittampalam@credit-suisse.com, samb → chak@cse.unsw.edu.au, ganesh.sittampalam@credit-suisse.com, samb

Also note the type of the newtype-derived

isInt :: c Int -> c Moo

Parametricity laws for higher-kinded variables are tricky, but if I am not mistaken, this type should not be inhabited by any terminating functions - that is, on the level of the source type system. After all, the newtype declaration promises to introduce a new type dinstinct from all others.

As SimonPJ notes in Bug #1251 (closed), newtype deriving lifts the newtype coercion !CoMoo to the source type level and that is IMHO the cause of the problem, as it violates the intention of newtype.

Newtype deriving the way it's done in ghc is scary. I think a newtype deriving should only be allowed if you (i.e., the compiler) can actually construct the instance declaration. This might be less general, but it would make me much more confident that it's correct.

Yeah, I also think that GHC should insist on being able to find an instance declaration -- though I don't care what it does with said declaration afterwards ;-).

changed weight to 7

assigned to @simonpj

Trac metadata

Trac field	Value
Priority	normal → high

There are two issues here.

The newtype coercions are not sound. They are unsafe and only to be used in disciplined ways.

The type Moo is different from the type Int. Hence, we shouldn't mix up values of both types. With the newtype coercion Moo ~ Int, there exists the danger that we would: a malicious or erroneous Core transformation may do so without being exposed by ill-typing. Hence, Core is less strongly typed than Haskell source. At present, there does not seem to be a satisfactory way to encode newtypes in Core that both preserves the strong typing and the optimizations.

Simon points out that other unsafe coercions are also created, e.g. the expression case error @ Int "bla" of _ -> 'z' is transformed into error @ Int "blah"cast(co :: Int ~ Char) where Int~Char is clearly unsound.

The newtype deriving construct is an undisciplined use of newtype coercions that exposes their unsafety.

The unsound assumption made is that every higher-order type that is applied to the newtype is an algebraic functor (not explicitly required to be an instance of the typeclass Functor).

For the isInt newtype deriving this could be written as:

instance IsInt Moo where isInt x = fmap (cast(co :: Int ~ Moo)) . (isInt :: c Int -> c Int)

Using the 'fmap cast' lifting law: fmap (cast (co :: Int ~ Moo)) == (cast(c co :: c Int ~ c Moo)) we get: instance IsInt Moo where isInt x = (cast(c co :: c Int ~ c Moo)) . (isInt :: c Int -> c Int) which is what the newtype deriving actually generates straight away, without checking whether c is a functor, whether fmap and the lifting law are at all applicable.

It seems that vanilla algebraic datatypes that are polymorphic, are implict functors. And it's this those implicit instances that are used by the newtype deriving.

However, GADTs are definitely not functors. Some shapes of values can only be used for certain types, regardless of whether any values of that type are involved at all, e.g. the ZI constructor can only be used for Int and not for Moo. Data families are essentially open GADTs.

We can attempt to fix the situation by imposing conditions on the types involved in

a newtype deriving:

generate the fmap code, and hence demand explicit Functor instances.

We should be able to expect the compiler to eliminate any runtime overhead.
conservatively check that only well-behaved higher-order types are applied to the

newtype: this rules out GADTs, data & type families, type variables (which can be

instantiated with ill-behaved types), but also ordinary ADTs that hide ill-behaved

types, like data T a = MkT (GADT a).

The second option seems to be too complicated to be reasonable and the first option perhaps too clumsy to be useful. The least I would do is to deprecate the use of newtype deriving.

Trac metadata

Trac field	Value
CC	chak@cse.unsw.edu.au, ganesh.sittampalam@credit-suisse.com, lennart.augustsson@credit-suisse.com, samb → chak@cse.unsw.edu.au, ganesh.sittampalam@credit-suisse.com, lennart.augustsson@credit-suisse.com, samb, tom.schrijvers@cs.kuleuven.be

Trac metadata

Trac field	Value
CC	chak@cse.unsw.edu.au, ganesh.sittampalam@credit-suisse.com, lennart.augustsson@credit-suisse.com, samb, tom.schrijvers@cs.kuleuven.be → chak@cse.unsw.edu.au, ganesh.sittampalam@credit-suisse.com, kfrdbs@gmail.com, lennart.augustsson@credit-suisse.com, samb, tom.schrijvers@cs.kuleuven.be

Replying to [ticket:1496#comment:16987 toms]:

However, GADTs are definitely not functors.

..

We can attempt to fix the situation by imposing conditions on the types involved in a newtype deriving:

generate the fmap code, and hence demand explicit Functor instances.

We should be able to expect the compiler to eliminate any runtime overhead.

From Frisby we have a data type that is a Functor and a GADT, but not a Functor in the way you and GHC expect:

data PE a where
-- leaving out most of the constructors, but note especially PMap
    Char :: IntSet.IntSet -> PE Char
    Failure :: PE a
    Not :: PE a -> PE ()
    Then :: PE a -> PE b -> PE (a,b)
    PMap :: (a -> b) -> PE a -> PE b

instance Functor PE where
    fmap = PMap

Applying the fmap-option would make interesting, type-correct code be generated that could not be optimized away (so the dictionaries couldn't be shared).

Newtypes and type families combine to produce inconsistent FC(X) axiom sets

Child items ...

Activity