Opened 11 months ago

Last modified 6 months ago

#12680 new feature request

Permit type equality instances in signatures

Reported by: ezyang Owned by:
Priority: low Milestone:
Component: Compiler (Type checker) Version: 8.1
Keywords: backpack Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: #13262 Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

David Christiansen requested this, and I want to record it so I don't forget. What he wants to be able to do is say things like this in signatures:

signature A where

data T
data S
type family F a

instance T ~ S
instance F Int ~ S

This is currently rejected by GHC because you are not allowed to define instances of type equality. But really all this is saying is that the type equality must hold when we eventually implement the module; easy to check.

Change History (21)

comment:1 Changed 11 months ago by goldfire

This idea seems reasonable... but your example does not, I'm afraid. If T and S are datatypes, then how could they be equal? And your F Int ~ S seems like it could just be an instance of the type family. On the other hand

data T
data S

type family F a
type family G a

instance F T ~ G S

seems more realistic.

comment:2 Changed 11 months ago by ezyang

If T and S are datatypes, then how could they be equal?

data S can be implemented with type S = T, thus making them equal. Admittedly the use of data X to represent an abstract data type is awful, arguably we should get the syntax type X working which is a bit clearer.

And your F Int ~ S seems like it could just be an instance of the type family.

Yes, I think you're right. The important thing is that if you write a type family instance in a signature, it does not indicate that there is *exactly* this reduction rule in the type family, just that it is derivable (though, what about injective type families? Ick.) David was specifically interested in playing around with alternative axiomatizations of type families by specifying equalities manually.

Last edited 11 months ago by ezyang (previous) (diff)

comment:3 Changed 11 months ago by goldfire

I've not followed the Backpack blow-by-blow, but using data S to denote an abstract type of any stripe looks bad to me. type S is much better, because users can't make any assumptions about a type declaration that they might for a data declaration.

Separately, you could get yourself into deep water here (not that that's necessary bad):

type Nat

type family (a :: Nat) + (b :: Nat)
instance a + b ~ b + a

Good luck implementing this signature! :)

It would be super awesome if the user could prove the equality manually.

comment:4 Changed 11 months ago by ezyang

type S is much better, because users can't make any assumptions about a type declaration that they might for a data declaration.

As far as I could tell, GHC actually doesn't make any assumptions when data S denotes an abstract type; so it's just a syntactic wart. It's a bit bothersome to fix because hs-boot syntax only supports data.

It would be super awesome if the user could prove the equality manually.

Do you have an example of a case where you could do this?

comment:5 in reply to:  4 Changed 11 months ago by goldfire

Replying to ezyang:

type S is much better, because users can't make any assumptions about a type declaration that they might for a data declaration.

As far as I could tell, GHC actually doesn't make any assumptions when data S denotes an abstract type; so it's just a syntactic wart. It's a bit bothersome to fix because hs-boot syntax only supports data.

I don't agree. In GHC 8:

-- A.hs-boot
module A where

data T a
-- B.hs
{-# LANGUAGE TypeFamilies #-}

module B where

import {-# SOURCE #-} A

foo :: (T a ~ T b) => a -> b
foo x = x

These compile fine. GHC must be assuming that T is injective. So we must proceed cautiously.

It would be super awesome if the user could prove the equality manually.

Do you have an example of a case where you could do this?

Nope! :)

Hence the level of awesomeness if we could do this. Admittedly well beyond the scope of Backpack.

comment:6 Changed 11 months ago by ezyang

OK, that looks like a straight up soundness bug. I'll fix it.

Nope! :)

Well, I guess what I meant was, what is a situation where you would like this?

comment:7 in reply to:  6 Changed 11 months ago by goldfire

Replying to ezyang:

OK, that looks like a straight up soundness bug. I'll fix it.

Fix what? I like the current behavior of the hs-boot file, because all data-types really are injective. Or do you mean fix what's used in a signature?

Well, I guess what I meant was, what is a situation where you would like this?

The Nat one (comment:3) is a great starting point.

comment:8 Changed 11 months ago by ezyang

Fix what? I like the current behavior of the hs-boot file, because all data-types really are injective. Or do you mean fix what's used in a signature?

What we use in an hsig file. I guess, type synonyms should not be allowed to implement abstract data; and then add a new type construct which doesn't assume injectivity.

The Nat one (comment:3) is a great starting point.

So, assuming that you had this equality floating around, would it actually do something useful? (Or could you end up with nonterminating type checking?)

Last edited 11 months ago by ezyang (previous) (diff)

comment:9 Changed 11 months ago by ezyang

Note: another reason you need to distinguish between type and data is because type must have its type arguments saturated (since it may be implemented with a type synonym).

comment:10 Changed 11 months ago by ezyang

This comment is total nonsense, ignore it. (The examples are good though!)

For the most part, I think we want an "abstract type synonym" to behave like an open type family with no (known) equations. So wherever we see type S :: * -> *, it should be as if we see type family S :: * -> *. We must relax the signature matching rules to allow a type family to be implemented using a type synonym. I think it's sound to behave this way, because type S = Foo can always be interpreted as type instance S = Foo.

There is two things to watch out for, however.

First, consider the following program:

signature H where
  type S
  f :: Int
module M where
  type S = Int
  f :: S

We would like M to match H, but if S is an open type family, the match will fail: S is not definitionally equal to Int, and matching is done strictly with definitional equality. I don't think we should change this for now: with more complex matching we will have to generate coercions to keep Core well-typed. So, instead, we just want to replace the open type family tycon with a type synonym tycon (in the same way an abstract algebraic data type gets replaced with the full data type). This is only sound if, when a module typechecks, it continues to typecheck if more definitional equalities hold.

This is actually not true; instance Show T and instance Show S will overlap if T is definitionally equal to S! But we knew that already: we can implement an abstract algebraic data type with a reexport (which will cause more definitional equalities to hold), and fortunately, Scott showed that this *is* sound in Haskell without instances, so it will do something vaguely reasonable.

So, that was a really long way of justifying why we can't just search replace type S with type family S and type S = Int with type family S = Int.

Second, consider the following program:

signature H where
  type F a
  instance Show a => Show (F a)

We'd really like this to work. But if F is implemented with type family (or a type synonym to a type family), this instance is illegal and can't be implemented. So what we want to be able to say is that type F a can be implemented with a type synonym, but *only* a type synonym with no type families in it. There doesn't seem to be a good way to make a claim like this in the language. But given that type family F a is still a fair way to specify a type family, perhaps the correct thing to do in this case is to just say that an abstract type can only be implemented by a type synonym with no type families, and check that upon matching. So the abstract type synonym really is a beast of its own: it is like an open type synonym family (in that it is not reducible, but could become reducible in the future), BUT it should still be accepted in instances, because once we *do* know how it's reducible, there will be no more type families left.

By the way, this discussion informed me of another problem:

{-# LANGUAGE TypeFamilies #-}
unit bar where
    signature H1 where
        data T
    signature H2 where
        data T
    module M where
        import qualified H1
        import qualified H2
        f :: (H1.T ~ H2.T) => a -> b
        f x = x

This typechecks! Today, the solver concludes that H1.T ~ H2.T is insoluble and accepts the rest of the function. But if H1 and H2 are instantiated in the same way, this equality could hold! So, for abstract data types from signatures (not boots) we better treat this differently.

So let us summarize:

  • Abstract data types data T a can be implemented by a data type or newtype declaration. We can assume that they are injective on all type parameters and can be used in instances.
  • Abstract type synonyms type T a can be implemented by a type synonym (with no type families), data type or newtype declaration. They can be used in instances. Arguably it should be possible to specify their injectivity (in the absence of type families, they are either injective or not used at all). I guess it should also be possible to specify their role.
  • Open type families type family T a can be implemented by a type family, type synonym, data type or newtype. Injectivity can be specified in the usual way. Roles are irrelevant.

Abstract type synonyms should probably be implemented as abstract data types that are not assumed to be injective; furthermore, both abstract data types and abstract type synonyms need to account for the fact that they might eventually become definitionally equal to something else. I am not so sure how this should happen; if we model off of open type families, the correct thing to do is flatten it with a fresh skolem variable, but this will thwart instance declarations, which is undesirable. So we will need to do something else.

Last edited 10 months ago by ezyang (previous) (diff)

comment:11 Changed 10 months ago by ezyang

OK, after a bit of hacking, I have a much better, gloriously simple solution:

  1. data T in hs-boot is treated as a "nominally distinct abstract type", whereas data T in hsig is treated as "totally abstract type"; the difference being that T may become definitionally equal to another type at some later point.
  1. When unifying TyCons in the type inference, when we would hard failure because a totally abstract TyCon doesn't unify with something else, we *instead* treat the constraint as irreducible and continue on. This prevents f :: (H1.T ~ H2.T) => a -> b from being treated as inaccessible. There may be some other cases I need to handle, but this is the big one.
  1. We continue to assume that abstract data is injective; e.g., (T a ~ T b) => a -> b should hold when T is totally abstract. When accepting a type synonym implementation of data, we need to ensure that it is injective. This can be done easily by (1) excluding any type synonyms which contain type families, and (2) ensuring that all type parameters of the type synonym are used (since "phantom" type parameters are not injective).
  1. If someone DOESN'T want us to assume that T is injective, they should declare an open type family instead ala type family T a instead of data T a. I suppose we should permit such a declaration to be *implemented* using a type synonym, but this doesn't seem very urgent since you can always write type instance T a = Blah a instead.

No more nonsense with open type families. Phabricator incoming.

comment:12 Changed 10 months ago by goldfire

This all makes me a bit worried.

First off, I think the best way to think about all of this is in terms of generativity and injectivity. As I read comment:10 and comment:11, I worry that these two concepts are being mixed, but they're really quite separable. (Below and throughout, we talk only of nominal equality.)

Generativity: If t1 and t2 are generative, then t1 a ~ t2 b implies t1 ~ t2.

It has been pointed out to me that generativity really describes a set of types, any two of which have the property above. That viewpoint may be enlightening. In particular, it says that generativity is not really an inherent property of a type, but instead exists only by relationship to other types.

Injectivity: If t is injective, then t a ~ t b implies that a ~ b.

Now, to respond to specific points in comment:10 and comment:11:

So, that was a really long way of justifying why we can't just search replace type S with type family S and type S = Int with type family S = Int.

So what can we say? You say that we can't use type families because type family LHSs are not definitionally equal (in FC) to their RHSs. (These are definitionally equal in Haskell, though.) I'm just a little lost as the upshot of the first section of comment:10.

a type synonym with no type families in it

I think what you mean here that you want a type synonym that is generative and injective. Only things that are generative and injective can appear as type patterns (that is, type/data family LHSs and instance heads). I call "generative and injective" matchable. The reason that only matchable types can appear in type patterns is that matching on anything that's not matchable is ill-defined. (Try it.)

One stumbling block here is that, of course, type synonyms are not generative -- a synonym can't be generative, roughly by definition! So what you really mean is that the expansion of the type synonym, after all vanilla synonyms are expanded, is matchable. And that no variables are lost en route. This is, I imagine, what is done to implement TypeSynonymInstances. Let's call type synonyms with this property "pre-matchable".

Now we can say: type T a in a signature declares a pre-matchable type synonym. Any implementing module will need to supply a pre-matchable type synonym definition for these pre-matchable type synonyms.

But I agree that this is a new beast, hitherto unknown.

But if H1 and H2 are instantiated in the same way, this equality could hold!

How can this happen? When I see data, I think you're declaring a matchable type. And matchable types are generative. Do I take that this equality between H1 and H2 can happen when mixing modules (as in "mix-in")?

If H1 and H2 really can equal, then they would appear to be pre-matchable. In this case, how does a data definition differ from a type definition in a signature?

Open type families type family T a can be implemented by a type family, type synonym, data type or newtype.

So my signature can have type family T a and my module data T a = MkT? That feels very fishy. It seems you are using type family as a proxy for "type that is not necessarily generative nor injective". Perhaps generativity and injectivity properties should just be specified directly.

When unifying TyCons in the type inference,

This should already happen, if you set the results for isGenerativeTyCon and isInjectiveTyCon correctly. (And if the canonicalizer is implemented correctly.) You want only a mismatch between generative TyCons do have a hard error, and that should be what's implemented.


In the past, whenever I've been dwelling on these kinds of issues, I have found that always reducing the problems to generativity and injectivity to be helpful. It might be nice to have a concrete statement of what definitions in signatures mean in terms of these properties.

comment:13 Changed 10 months ago by ezyang

goldfire and I had a chat, and here were the conclusions:

  1. comment:10 is totally nonsense, ignore it.
  1. I'm happy to give up on abstract type synonyms. Maybe someone will find them useful for something, but they're not necessary for the Backpack use-cases I have in mind.
  1. We agreed that these two examples should fundamentally work the same way:
unit p where
  signature H1 where
    data T
  signature H2 where
    data T
  module M where
    import qualified H1
    import qualified H2
    f :: H1.T -> H2.T
    f x = x -- ill typed; the Ts are NOT obviously equal
    g :: (H1.T ~ H2.T) => a -> b
    g x = x -- ill typed (H1.T ~ H2.T should not make this inaccessible)

unit q where
  signature H1 where
    data T1
  signature H2 where
    data T2
  module M where
    import H1
    import H2
    f :: T1 -> T2
    f x = x -- ill typed
    g :: (T1 ~ T2) => a -> b
    g x = x -- ill typed (T1 ~ T2 should not make this inaccessible)
  1. Here's what you do: totally abstract data types (data T in an hsig file) are generative AND injective. They can only be implemented by data (except, see below). The only difference is that we do not eagerly fail when we come up with an insoluble constraint in the givens involving a totally abstract type. As the OutsideIn(X) paper states, it is sound to not eagerly fail when simplifying givens, and removing a failure case should not impact guess-freeness of the solver. While it is true that when the solver encounters a wanted T1 ~ T2 (where T1/T2 are totally abstract) it could kick this constraint out to the use-site, but this is silly; we really do want to report a type error here.
  1. Under certain restricted circumstances, we can implement totally abstract data with a type synonym. But there are quite a few conditions that need to be upheld: it must be generative, injective, and partially applicable. A sufficient condition for the type synonym is for it to have NO type parameters and for it to have no type family applications.

comment:14 Changed 10 months ago by ezyang

One way to help disentangle the difference between generativity and nominal distinctness is to look at the meaning of abstract types in signatures in an elaboration to a simpler language like Haskell!

When I write a unit:

unit p where
  signature A where
    data T
    data S
  module M where
    import A
    f :: T ~ S => a -> b
    f x = x

I am effectively defining a Haskell program along the lines (with a little cheating to have structural records):

p :: forall t s. { f :: forall a b. t ~ s => a -> b }
p = { f = \x -> x } -- ill-typed

When I am typechecking the body of p, t ~ s clearly does not hold, as t and s are both skolem variables and aren't available for unification. But that doesn't mean that we can treat the body of f as inaccessible: obviously if I invoke p as p @Int @Int, Int ~ Int is provable.

What Backpack takes advantage of is the outrageous coincidence between skolem variables and abstract types. This should not at all be surprising, since abstract types are existential types, and (∃x. τ) -> τ' is equivalent to ∀x. (τ -> τ'). We want to treat abstract types like skolem variables, and I conjecture that the rules for OutsideIn(X) are compatible with this treatment, except for a few simplification rules (like eager failure when a given does not canonicalize). This would be good to pose formally and check with the paper.

I think this also explains the confusing tension between "abstract data is generative" but "type synonyms are not generative, and thus should not be allowed to implement abstract data". Skolem variables are extremely well behaved: they can only be instantiated by types, so if we have s :: * -> * no one is going to set s ~ S, where type S a = .... S is just not a valid type. And of course, all types are "generative" in the sense described earlier. (Instances (and the need for abstract types to be implemented without type families) are a giant fly in the ointment, but we are punting on soundness in this case *anyway*, so I am not inclined to look to closely at the matter.)

The claim that T does not unify with S, EVER (as is the case in an hs-boot file) is a very unnatural claim that would be difficult to translate into Haskell via this elaboration. I would argue this is more of an artifact of how we have implemented hs-boot files over anything else (if we had shaping, this "nominal distinctness" wouldn't even hold there.)

tl;dr: Abstract types in signatures are skolem variables spelled with capital letters; the rules for inference with abstract types are the same as the rules for inference with skolem variables (modulo early failure when givens fail to canonicalize). At the eventual call site, you can instantiate the skolem variables with whatever you want.

Last edited 10 months ago by ezyang (previous) (diff)

comment:15 Changed 10 months ago by goldfire

Very interesting observations. I agree! Note that skolem variables even have the "don't fail eagerly in a given" behavior that you want. I really like this way of thinking about it all.

comment:16 Changed 10 months ago by ezyang

I guess I should also add, the OutsideIn(X) paper has no discussion of higher kinded types (a :: * -> *), so I guess any such comparison will have to be limited to the code actually in GHC.

comment:17 Changed 10 months ago by ezyang

Richard, do you know where in the typechecker is the handling for skolem variables to "not fail eagerly in a given"?

comment:18 Changed 10 months ago by goldfire

There isn't a specific spot. An equality between skolems would end up in TcCanonical.canEqTyVarTyVar which just builds the equality. Because tyvar equalities are never absolutely insoluble, the skolem case isn't separated out.

comment:19 Changed 10 months ago by Edward Z. Yang <ezyang@…>

In 518f2895/ghc:

New story for abstract data types in hsig files.

Summary:
In the old implementation of hsig files, we directly
reused the implementation of abstract data types from
hs-boot files.  However, this was WRONG.  Consider the
following program (an abridged version of bkpfail24):

    {-# LANGUAGE GADTs #-}
    unit p where
        signature H1 where
            data T
        signature H2 where
            data T
        module M where
            import qualified H1
            import qualified H2

            f :: H1.T ~ H2.T => a -> b
            f x = x

Prior to this patch, M was accepted, because the type
inference engine concluded that H1.T ~ H2.T does not
hold (indeed, *presently*, it does not).  However, if
we subsequently instantiate p with the same module for
H1 and H2, H1.T ~ H2.T does hold!  Unsound.

The key is that abstract types from signatures need to
be treated like *skolem variables*, since you can interpret
a Backpack unit as a record which is universally quantified
over all of its abstract types, as such (with some fake
syntax for structural records):

    p :: forall t1 t2. { f :: t1 ~ t2 => a -> b }
    p = { f = \x -> x } -- ill-typed

Clearly t1 ~ t2 is not solvable inside p, and also clearly
it could be true at some point in the future, so we better
not treat the lambda expression after f as inaccessible.

The fix seems to be simple: do NOT eagerly fail when trying
to simplify the given constraints.  Instead, treat H1.T ~ H2.T
as an irreducible constraint (rather than an insoluble
one); this causes GHC to treat f as accessible--now we will
typecheck the rest of the function (and correctly fail).
Per the OutsideIn(X) paper, it's always sound to fail less
when simplifying givens.

We do NOT apply this fix to hs-boot files, where abstract
data is also guaranteed to be nominally distinct (since
it can't be implemented via a reexport or a type synonym.)
This is a somewhat unnatural state of affairs (there's
no way to really interpret this in Haskell land) but
no reason to change behavior.

I deleted "representationally distinct abstract data",
which is never used anywhere in GHC.

In the process of constructing this fix, I also realized
our implementation of type synonym matching against abstract
data was not sufficiently restrictive.  In order for
a type synonym T to be well-formed type, it must be a
nullary synonym (i.e., type T :: * -> *, not type T a = ...).
Furthermore, since we use abstract data when defining
instances, they must not have any type family applications.

More details in #12680.  This probably deserves some sort
of short paper report.

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>

Test Plan: validate

Reviewers: goldfire, simonpj, austin, bgamari

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D2594

comment:20 Changed 6 months ago by ezyang

Maybe this proposal cannot be made to work: GHC doesn't accept type families in instance heads (for fairly good reasons). If that's the case, is there really anything interesting one can do with this extension?

comment:21 Changed 6 months ago by ezyang

Blocked By: 13262 added

This ticket is blocked on #13262.

Note: See TracTickets for help on using tickets.