Opened 9 months ago

Last modified 4 months ago

## #12680 new feature request

# Permit type equality instances in signatures

Reported by: | ezyang | Owned by: | |
---|---|---|---|

Priority: | low | Milestone: | |

Component: | Compiler (Type checker) | Version: | 8.1 |

Keywords: | backpack | Cc: | |

Operating System: | Unknown/Multiple | Architecture: | Unknown/Multiple |

Type of failure: | None/Unknown | Test Case: | |

Blocked By: | #13262 | Blocking: | |

Related Tickets: | Differential Rev(s): | ||

Wiki Page: |

### Description

David Christiansen requested this, and I want to record it so I don't forget. What he wants to be able to do is say things like this in signatures:

signature A where data T data S type family F a instance T ~ S instance F Int ~ S

This is currently rejected by GHC because you are not allowed to define instances of type equality. But really all this is saying is that the type equality must hold when we eventually implement the module; easy to check.

### Change History (21)

### comment:1 Changed 9 months ago by

### comment:2 Changed 9 months ago by

If T and S are datatypes, then how could they be equal?

`data S`

can be implemented with `type S = T`

, thus making them equal. Admittedly the use of `data X`

to represent an abstract data type is awful, arguably we should get the syntax `type X`

working which is a bit clearer.

And your

`F Int ~ S`

seems like it could just be an instance of the type family.

Yes, I think you're right. The important thing is that if you write a type family instance in a signature, it does not indicate that there is *exactly* this reduction rule in the type family, just that it is derivable (though, what about injective type families? Ick.) David was specifically interested in playing around with alternative axiomatizations of type families by specifying equalities manually.

### comment:3 Changed 9 months ago by

I've not followed the Backpack blow-by-blow, but using `data S`

to denote an abstract type of any stripe looks bad to me. `type S`

is much better, because users can't make any assumptions about a `type`

declaration that they might for a `data`

declaration.

Separately, you could get yourself into deep water here (not that that's necessary bad):

type Nat type family (a :: Nat) + (b :: Nat) instance a + b ~ b + a

Good luck implementing this signature! :)

It would be super awesome if the user could prove the equality manually.

### comment:4 follow-up: 5 Changed 9 months ago by

type S is much better, because users can't make any assumptions about a type declaration that they might for a data declaration.

As far as I could tell, GHC actually doesn't make any assumptions when `data S`

denotes an abstract type; so it's just a syntactic wart. It's a bit bothersome to fix because hs-boot syntax only supports data.

It would be super awesome if the user could prove the equality manually.

Do you have an example of a case where you could do this?

### comment:5 Changed 9 months ago by

Replying to ezyang:

type S is much better, because users can't make any assumptions about a type declaration that they might for a data declaration.

As far as I could tell, GHC actually doesn't make any assumptions when

`data S`

denotes an abstract type; so it's just a syntactic wart. It's a bit bothersome to fix because hs-boot syntax only supports data.

I don't agree. In GHC 8:

-- A.hs-boot module A where data T a

-- B.hs {-# LANGUAGE TypeFamilies #-} module B where import {-# SOURCE #-} A foo :: (T a ~ T b) => a -> b foo x = x

These compile fine. GHC must be assuming that `T`

is injective. So we must proceed cautiously.

It would be super awesome if the user could prove the equality manually.

Do you have an example of a case where you could do this?

Nope! :)

Hence the level of awesomeness if we could do this. Admittedly well beyond the scope of Backpack.

### comment:6 follow-up: 7 Changed 9 months ago by

OK, that looks like a straight up soundness bug. I'll fix it.

Nope! :)

Well, I guess what I meant was, what is a situation where you would like this?

### comment:7 Changed 9 months ago by

Replying to ezyang:

OK, that looks like a straight up soundness bug. I'll fix it.

Fix what? I like the current behavior of the hs-boot file, because all `data`

-types really are injective. Or do you mean fix what's used in a signature?

Well, I guess what I meant was, what is a situation where you would like this?

The `Nat`

one (comment:3) is a great starting point.

### comment:8 Changed 9 months ago by

Fix what? I like the current behavior of the hs-boot file, because all data-types really are injective. Or do you mean fix what's used in a signature?

What we use in an hsig file. I guess, type synonyms should not be allowed to implement abstract data; and then add a new type construct which doesn't assume injectivity.

The Nat one (comment:3) is a great starting point.

So, assuming that you had this equality floating around, would it actually do something useful? (Or could you end up with nonterminating type checking?)

### comment:9 Changed 9 months ago by

Note: another reason you need to distinguish between `type`

and `data`

is because `type`

must have its type arguments saturated (since it may be implemented with a type synonym).

### comment:10 Changed 9 months ago by

**This comment is total nonsense, ignore it. (The examples are good though!)**

For the most part, I think we want an "abstract type synonym" to behave like an open type family with no (known) equations. So wherever we see `type S :: * -> *`

, it should be as if we see `type family S :: * -> *`

. We must relax the signature matching rules to allow a type family to be implemented using a type synonym. I think it's sound to behave this way, because `type S = Foo`

can always be interpreted as `type instance S = Foo`

.

There is two things to watch out for, however.

First, consider the following program:

signature H where type S f :: Int module M where type S = Int f :: S

We would like M to match H, but if S is an open type family, the match will fail: S is not definitionally equal to Int, and matching is done strictly with definitional equality. I don't think we should change this for now: with more complex matching we will have to generate coercions to keep Core well-typed. So, instead, we just want to replace the open type family tycon with a type synonym tycon (in the same way an abstract algebraic data type gets replaced with the full data type). This is only sound if, when a module typechecks, it continues to typecheck if more definitional equalities hold.

This is actually not true; `instance Show T`

and `instance Show S`

will overlap if T is definitionally equal to S! But we knew that already: we can implement an abstract algebraic data type with a reexport (which will cause more definitional equalities to hold), and fortunately, Scott showed that this *is* sound in Haskell without instances, so it will do something vaguely reasonable.

So, that was a really long way of justifying why we can't just search replace `type S`

with `type family S`

and `type S = Int`

with `type family S = Int`

.

Second, consider the following program:

signature H where type F a instance Show a => Show (F a)

We'd really like this to work. But if F is implemented with type family (or a type synonym to a type family), this instance is illegal and can't be implemented. So what we want to be able to say is that `type F a`

can be implemented with a type synonym, but *only* a type synonym with no type families in it. There doesn't seem to be a good way to make a claim like this in the language. But given that `type family F a`

```
is still a fair way to specify a type family, perhaps the correct thing to do in this case is to just say that an abstract type can only be implemented by a type synonym with no type families, and check that upon matching. So the abstract type synonym really is a beast of its own: it is like an open type synonym family (in that it is not reducible, but could become reducible in the future), BUT it should still be accepted in instances, because once we *do* know how it's reducible, there will be no more type families left.
```

By the way, this discussion informed me of another problem:

{-# LANGUAGE TypeFamilies #-} unit bar where signature H1 where data T signature H2 where data T module M where import qualified H1 import qualified H2 f :: (H1.T ~ H2.T) => a -> b f x = x

This typechecks! Today, the solver concludes that `H1.T ~ H2.T`

```
is insoluble and accepts the rest of the function. But if H1 and H2 are instantiated in the same way, this equality could hold! So, for abstract data types from signatures (not boots) we better treat this differently.
```

So let us summarize:

- Abstract data types
`data T a`

can be implemented by a data type or newtype declaration. We can assume that they are injective on all type parameters and can be used in instances.

- Abstract type synonyms
`type T a`

can be implemented by a type synonym (with no type families), data type or newtype declaration. They can be used in instances. Arguably it should be possible to specify their injectivity (in the absence of type families, they are either injective or not used at all). I guess it should also be possible to specify their role.

- Open type families
`type family T a`

can be implemented by a type family, type synonym, data type or newtype. Injectivity can be specified in the usual way. Roles are irrelevant.

Abstract type synonyms should probably be implemented as abstract data types that are not assumed to be injective; furthermore, both abstract data types and abstract type synonyms need to account for the fact that they might eventually become definitionally equal to something else. I am not so sure how this should happen; if we model off of open type families, the correct thing to do is flatten it with a fresh skolem variable, but this will thwart instance declarations, which is undesirable. So we will need to do something else.

### comment:11 Changed 8 months ago by

OK, after a bit of hacking, I have a much better, gloriously simple solution:

`data T`

in hs-boot is treated as a "nominally distinct abstract type", whereas`data T`

in hsig is treated as "totally abstract type"; the difference being that T may become definitionally equal to another type at some later point.

- When unifying TyCons in the type inference, when we would hard failure because a totally abstract TyCon doesn't unify with something else, we *instead* treat the constraint as irreducible and continue on. This prevents
`f :: (H1.T ~ H2.T) => a -> b`

from being treated as inaccessible. There may be some other cases I need to handle, but this is the big one.

- We continue to assume that abstract data is injective; e.g.,
`(T a ~ T b) => a -> b`

should hold when T is totally abstract. When accepting a type synonym implementation of data, we need to ensure that it is injective. This can be done easily by (1) excluding any type synonyms which contain type families, and (2) ensuring that all type parameters of the type synonym are used (since "phantom" type parameters are not injective).

- If someone DOESN'T want us to assume that
`T`

is injective, they should declare an open type family instead ala`type family T a`

instead of`data T a`

. I suppose we should permit such a declaration to be *implemented* using a type synonym, but this doesn't seem very urgent since you can always write`type instance T a = Blah a`

instead.

No more nonsense with open type families. Phabricator incoming.

### comment:12 Changed 8 months ago by

This all makes me a bit worried.

First off, I think the best way to think about all of this is in terms of generativity and injectivity. As I read comment:10 and comment:11, I worry that these two concepts are being mixed, but they're really quite separable. (Below and throughout, we talk only of nominal equality.)

**Generativity:** If `t1`

and `t2`

are *generative*, then `t1 a ~ t2 b`

implies `t1 ~ t2`

.

It has been pointed out to me that generativity really describes a set of types, any two of which have the property above. That viewpoint may be enlightening. In particular, it says that generativity is not really an inherent property of a type, but instead exists only by relationship to other types.

**Injectivity:** If `t`

is injective, then `t a ~ t b`

implies that `a ~ b`

.

Now, to respond to specific points in comment:10 and comment:11:

So, that was a really long way of justifying why we can't just search replace

`type S`

with`type family S`

and`type S = Int`

with`type family S = Int`

.

So what *can* we say? You say that we can't use type families because type family LHSs are not definitionally equal (in FC) to their RHSs. (These *are* definitionally equal in *Haskell*, though.) I'm just a little lost as the upshot of the first section of comment:10.

a type synonym with no type families in it

I think what you mean here that you want a type synonym that is generative and injective. Only things that are generative and injective can appear as type patterns (that is, type/data family LHSs and instance heads). I call "generative and injective" *matchable*. The reason that only matchable types can appear in type patterns is that matching on anything that's not matchable is ill-defined. (Try it.)

One stumbling block here is that, of course, type synonyms are not generative -- a *synonym* can't be generative, roughly by definition! So what you really mean is that the expansion of the type synonym, after all vanilla synonyms are expanded, is matchable. And that no variables are lost en route. This is, I imagine, what is done to implement `TypeSynonymInstances`

. Let's call type synonyms with this property "pre-matchable".

Now we can say: `type T a`

in a signature declares a pre-matchable type synonym. Any implementing module will need to supply a pre-matchable type synonym definition for these pre-matchable type synonyms.

But I agree that this is a new beast, hitherto unknown.

But if H1 and H2 are instantiated in the same way, this equality could hold!

How can this happen? When I see `data`

, I think you're declaring a matchable type. And matchable types are generative. Do I take that this equality between `H1`

and `H2`

can happen when mixing modules (as in "mix-in")?

If `H1`

and `H2`

really *can* equal, then they would appear to be pre-matchable. In this case, how does a `data`

definition differ from a `type`

definition in a signature?

Open type families

`type family T a`

can be implemented by a type family, type synonym, data type or newtype.

So my signature can have `type family T a`

and my module `data T a = MkT`

? That feels very fishy. It seems you are using `type family`

as a proxy for "type that is not necessarily generative nor injective". Perhaps generativity and injectivity properties should just be specified directly.

When unifying TyCons in the type inference,

This should already happen, if you set the results for `isGenerativeTyCon`

and `isInjectiveTyCon`

correctly. (And if the canonicalizer is implemented correctly.) You want only a mismatch between generative TyCons do have a hard error, and that should be what's implemented.

In the past, whenever I've been dwelling on these kinds of issues, I have found that always reducing the problems to generativity and injectivity to be helpful. It might be nice to have a concrete statement of what definitions in signatures mean in terms of these properties.

### comment:13 Changed 8 months ago by

goldfire and I had a chat, and here were the conclusions:

- comment:10 is totally nonsense, ignore it.

- I'm happy to give up on abstract type synonyms. Maybe someone will find them useful for something, but they're not necessary for the Backpack use-cases I have in mind.

- We agreed that these two examples should fundamentally work the same way:

unit p where signature H1 where data T signature H2 where data T module M where import qualified H1 import qualified H2 f :: H1.T -> H2.T f x = x -- ill typed; the Ts are NOT obviously equal g :: (H1.T ~ H2.T) => a -> b g x = x -- ill typed (H1.T ~ H2.T should not make this inaccessible) unit q where signature H1 where data T1 signature H2 where data T2 module M where import H1 import H2 f :: T1 -> T2 f x = x -- ill typed g :: (T1 ~ T2) => a -> b g x = x -- ill typed (T1 ~ T2 should not make this inaccessible)

- Here's what you do: totally abstract data types (
`data T`

in an hsig file) are generative AND injective. They can only be implemented by`data`

(except, see below). The only difference is that we do not eagerly fail when we come up with an insoluble constraint in the givens involving a totally abstract type. As the OutsideIn(X) paper states, it is sound to not eagerly fail when simplifying givens, and removing a failure case should not impact guess-freeness of the solver. While it is true that when the solver encounters a wanted`T1 ~ T2`

(where T1/T2 are totally abstract) it could kick this constraint out to the use-site, but this is silly; we really do want to report a type error here.

- Under certain restricted circumstances, we can implement totally abstract data with a type synonym. But there are quite a few conditions that need to be upheld: it must be generative, injective, and partially applicable. A sufficient condition for the type synonym is for it to have NO type parameters and for it to have no type family applications.

### comment:14 Changed 8 months ago by

One way to help disentangle the difference between generativity and nominal distinctness is to look at the meaning of abstract types in signatures in an elaboration to a simpler language like Haskell!

When I write a unit:

unit p where signature A where data T data S module M where import A f :: T ~ S => a -> b f x = x

I am effectively defining a Haskell program along the lines (with a little cheating to have structural records):

p :: forall t s. { f :: forall a b. t ~ s => a -> b } p = { f = \x -> x } -- ill-typed

When I am typechecking the body of p, `t ~ s`

clearly does not hold, as `t`

and `s`

are both skolem variables and aren't available for unification. But that doesn't mean that we can treat the body of `f`

as inaccessible: obviously if I invoke p as `p @Int @Int`

, `Int ~ Int`

is provable.

What Backpack takes advantage of is the outrageous coincidence between skolem variables and abstract types. This should not at all be surprising, since abstract types are existential types, and (∃x. τ) -> τ' is equivalent to ∀x. (τ -> τ'). We want to treat abstract types like skolem variables, and I conjecture that the rules for OutsideIn(X) are compatible with this treatment, except for a few simplification rules (like eager failure when a given does not canonicalize). This would be good to pose formally and check with the paper.

I think this also explains the confusing tension between "abstract data is generative" but "type synonyms are not generative, and thus should not be allowed to implement abstract data". Skolem variables are extremely well behaved: they can only be instantiated by types, so if we have `s :: * -> *`

no one is going to set `s ~ S`

, where `type S a = ...`

. `S`

is just not a valid type. And of course, all types are "generative" in the sense described earlier. (Instances (and the need for abstract types to be implemented without type families) are a giant fly in the ointment, but we are punting on soundness in this case *anyway*, so I am not inclined to look to closely at the matter.)

The claim that T does not unify with S, EVER (as is the case in an hs-boot file) is a very unnatural claim that would be difficult to translate into Haskell via this elaboration. I would argue this is more of an artifact of how we have implemented hs-boot files over anything else (if we had shaping, this "nominal distinctness" wouldn't even hold there.)

tl;dr: Abstract types in signatures are skolem variables spelled with capital letters; the rules for inference with abstract types are the same as the rules for inference with skolem variables (modulo early failure when givens fail to canonicalize). At the eventual call site, you can instantiate the skolem variables with whatever you want.

### comment:15 Changed 8 months ago by

Very interesting observations. I agree! Note that skolem variables even have the "don't fail eagerly in a given" behavior that you want. I really like this way of thinking about it all.

### comment:16 Changed 8 months ago by

I guess I should also add, the OutsideIn(X) paper has no discussion of higher kinded types (`a :: * -> *`

), so I guess any such comparison will have to be limited to the code actually in GHC.

### comment:17 Changed 8 months ago by

Richard, do you know where in the typechecker is the handling for skolem variables to "not fail eagerly in a given"?

### comment:18 Changed 8 months ago by

There isn't a specific spot. An equality between skolems would end up in `TcCanonical.canEqTyVarTyVar`

which just builds the equality. Because tyvar equalities are never absolutely insoluble, the skolem case isn't separated out.

### comment:20 Changed 5 months ago by

Maybe this proposal cannot be made to work: GHC doesn't accept type families in instance heads (for fairly good reasons). If that's the case, is there really anything interesting one can do with this extension?

**Note:**See TracTickets for help on using tickets.

This idea seems reasonable... but your example does not, I'm afraid. If

`T`

and`S`

are datatypes, then how could they be equal? And your`F Int ~ S`

seems like it could just be an instance of the type family. On the other handseems more realistic.