#11732 closed bug (fixed)

Deriving Generic1 interacts poorly with TypeInType

Reported by: goldfire Owned by:
Priority: normal Milestone: 8.0.1
Component: Compiler Version: 8.1
Keywords: TypeInType, Generics Cc: andres@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: #5939 Differential Rev(s): Phab:D2061
Wiki Page:

Description

From @RyanGlScott, comment:9:ticket:11357:

Vanilla datatypes and data family instances are still inconsistent w.r.t. which type variables are considered "instantiated" in a Generic1 instance. For instance, this is rejected:

λ> data Proxy k (a :: k) = ProxyCon deriving Generic1
<interactive>:32:43: error:
    • Can't make a derived instance of ‘Generic1 (Proxy *)’:
        Proxy must not be instantiated; try deriving `Proxy k a' instead
    • In the data declaration for ‘Proxy’

And rightfully so, since the visible kind binder k is instantiated to *. But now it's possible to have an equivalent instance for a data family that squeaks past this check!

λ> data family ProxyFam (a :: y) (b :: z)
λ> data instance ProxyFam k (a :: k) = ProxyFamCon deriving Generic1

==================== Derived instances ====================
Derived instances:
  instance GHC.Generics.Generic1 (Ghci13.ProxyFam *) where
    ...

[Ryan needs] to investigate further to see why this is the case.

Change History (22)

comment:1 Changed 16 months ago by RyanGlScott

Replying to goldfire:

I'm frankly unsure (without more research) as to what the desired behavior should be here.

The reason for this restriction dates back to #5939, when someone wanted to try to standalone derive this:

data T a = T a
deriving instance Generic (T Bool)

This seems intuitive, but Pedro couldn't find a way to make it work, since the generated Rep type only uses information from T's TyCon, so there was no good way to relate the TyVar a with Bool. He then declared that such instances should be rejected altogether, hence this instantiation check.

Whether the instantiation check is a good idea (and ways to get around it) are a side issue, I feel like. The real issue is that data family instances classify their types as visible/invisible in a completely different manner than vanilla datatypes. I did a bit of digging into the ProxyFam example above. With this:

data Proxy k (a :: k) = ProxyCon deriving Generic1

partitionInvisibles user_tc id tc_args yields ([],[*]), like you'd expect (where k has been instantiated to * due to Generic1 being of kind (* -> *) -> Constraint). But with this:

data family ProxyFam (a :: y) (b :: z)
data instance ProxyFam k (a :: k) = ProxyFamCon deriving Generic1

partitionInvisibles user_tc id tc_args yields ([*],[])! Somehow, k is being treated as invisible when it clearly shouldn't. It's beyond my pay grade to speculate on why that is, but it's definitely not the only -XTypeInType-related data family discrepancy.

Last edited 16 months ago by RyanGlScott (previous) (diff)

comment:2 Changed 16 months ago by goldfire

I claim that

data Proxy k (a :: k) = ProxyCon deriving Generic1

should fail outright. I don't think GHC should be in the business of inferring values for visible parameters like k, even when it could. Instead, it should be this:

deriving instance Generic1 (Proxy *)

Note that this applies to Functor as much as it does to Generic1.

As for the partitionInvisibles piece: yes, that is all suspicious. I conjecture that we should never call partitionInvisibles on a data instance tycon, but only on the data family tycon. Perhaps we need to teach partitionInvisibles how to work on data instances. But not today, I'm afraid.

comment:3 in reply to:  2 Changed 16 months ago by RyanGlScott

Replying to goldfire:

I claim that

data Proxy k (a :: k) = ProxyCon deriving Generic1

should fail outright. I don't think GHC should be in the business of inferring values for visible parameters like k, even when it could.

Agreed.

Instead, it should be this:

deriving instance Generic1 (Proxy *)

I'm OK with this idea, but we'd need to be able to teach gen_Generic_binds about substituting type variables for user-supplied types somehow, since they are reflected in the generated Rep/Rep1 associated type instance. I have no idea how challenging that would be to implement.

Note that this applies to Functor as much as it does to Generic1.

So you'd propose applying an instantiation check to every class in a deriving clause, then?

As for the partitionInvisibles piece: yes, that is all suspicious. I conjecture that we should never call partitionInvisibles on a data instance tycon, but only on the data family tycon.

That's what we're doing at the moment, no? We're calling filterOutInvisibleTypes in canDoGenerics here (which indirectly invokes partitionInvisibles), and your earlier patch changed the TyCon being passed to filterOutInvisibleTypes to the data family tycon.

comment:4 Changed 16 months ago by goldfire

No instantiation check. But what GHC does right now is, for every class in a datatype's deriving clause, drop as many type parameters as there are arguments to the class being derived. For example

data Foo a deriving (Show, Functor)

GHC knows that we mean instance Show (Foo a) and instance Functor Foo.

But

data Proxy k (a :: k) deriving Functor

would try instance Functor (Proxy k), which is ill-kinded. GHC is surely clever enough to figure out that it should really do instance Functor (Proxy *), but I think it should refrain from doing this.

More to the point of this ticket: I have a bad feeling my fix for #11357 is utterly broken, in that it uses the tc_args from the instance tycon despite using visibilities from the parent tycon. Of course, the tc_args don't match up. I suppose a mkFamilyTyConApp would work nicely here.

But you seem to know much more about generics in GHC than I do. Do you think you could make this fix? I really do think we just need to use mkFamilyTyConApp in the right spot here.

comment:5 in reply to:  4 Changed 16 months ago by RyanGlScott

Replying to goldfire:

But

data Proxy k (a :: k) deriving Functor

would try instance Functor (Proxy k), which is ill-kinded. GHC is surely clever enough to figure out that it should really do instance Functor (Proxy *), but I think it should refrain from doing this.

Right, that's what I mean by "instantiation check". Currently, GHC unifies the kind of the typeclass with the kind of the datatype. If, in this process, it ends up instantiating a visible TyBinder in the datatype with something other than a type variable, it should be rejected.

More to the point of this ticket: I have a bad feeling my fix for #11357 is utterly broken, in that it uses the tc_args from the instance tycon despite using visibilities from the parent tycon. Of course, the tc_args don't match up. I suppose a mkFamilyTyConApp would work nicely here.

But you seem to know much more about generics in GHC than I do. Do you think you could make this fix? I really do think we just need to use mkFamilyTyConApp in the right spot here.

I'm not sure what help mkFamilyTyConApp would provide, to be honest. Are you saying to do something like this:

let (tc', tc_args') = splitTyConApp (mkFamilyTyConApp tc tc_args)

If so, wouldn't you'd end up with the data family tycon and arguments? I tried this, and while it seemed to fix the above issue, it comically broke something else:

data family URec p a
data instance URec Char a = UChar

Since it mistakenly believes that URec Char a is too instantiated (it shouldn't even be considering Char at all, which is normally the case when you get your type arguments from tcLookupDataFamInst).

comment:6 Changed 16 months ago by goldfire

Yech. I'm beginning to believe that there is something wonky about the instantiation check. As it stands, the check forbids (or tries to) only visible parameters from being instantiated. But in the TypeInType world, there's not a rigid difference between visible parameters and invisible ones. It used to be that any parameter upon which another parameter depends (that is, a kind variable) must be invisible. That's not the case now. So doing a check like this based on visibility seems wrong.

We could, potentially, do it based on dependency -- it's not so hard figure out which parameters are dependent. And indeed this information is readily available by looking at a tyConBinders: the Named ones are dependent, and the Anon ones are not. This should be as true for a data instance tycon as any other. So perhaps that's the way forward, and it explains why everything up to now has been just wrong: we were doing the wrong check. We were checking for visibility when we meant to check for dependency.

So perhaps the check looks something like

if (all isTyVarTy [ arg | (arg, binder) <- tc_args `zip` tyConBinders tc
                        , isAnonBinder binder ]) ...

No more fussing with filterOutInvisibleTypes and no more bothering with mkFamilyTyConApp.

Before blindly trying this, though, I'd love someone (Ryan, it seems) who understands Generics to consider what this instantiation check is really trying to do, and why it (previously) avoided looking at kinds. And why it makes sense to look at dependency, as I've proposed here. I've no real clue what's going on, and when I deleted isKind (which was what was used previously) I just reached for the closest thing, which was visibility.

comment:7 in reply to:  6 Changed 16 months ago by RyanGlScott

Replying to goldfire:

Before blindly trying this, though, I'd love someone (Ryan, it seems) who understands Generics to consider what this instantiation check is really trying to do, and why it (previously) avoided looking at kinds. And why it makes sense to look at dependency, as I've proposed here. I've no real clue what's going on, and when I deleted isKind (which was what was used previously) I just reached for the closest thing, which was visibility.

As I noted here, the reason why gen_Generic_binds seems to need the instantiation check at the moment is a matter of implementation practicality. Deriving Generic(1) is different from any other derived classes in that in generates a type family instance (Rep/Rep1), so it needs to be able to faithfully reproduce the the same type as what the user provides. GHC's current mechanism for retrieving the type is by examining the TyCon (see here for the code).

As a result, GHC only has knowledge of the tyConTyVars on the LHS and on the RHS of the derived Rep(1) instance. If any of those type variables were to be instantiated to a concrete type, it would result in an apparent mismatch between the instance head and the generated Rep(1) instance (the subject of ticket #5939), e.g.,

data T a b c = T a b c
deriving instance Generic (T a Bool c)

==>

instance Generic (T a Bool c) where
  type Rep (T a b c) = a :*: b :*: c -- wrong?

But interestingly, it's possible to have this kind of mismatch with clever use of -XTypeInType. There's the example in the ticket description, plus this:

λ> :set -XDeriveGeneric -XTypeInType -ddump-deriv
λ> import Data.Proxy
λ> data P (a :: k) = P (Proxy k) deriving Generic1

==================== Derived instances ====================
Derived instances:
  instance GHC.Generics.Generic1 Ghci3.P where
    GHC.Generics.from1 (Ghci3.P g1_a2b8)
      = GHC.Generics.M1
          (GHC.Generics.M1 (GHC.Generics.M1 (GHC.Generics.K1 g1_a2b8)))
    GHC.Generics.to1
      (GHC.Generics.M1 (GHC.Generics.M1 (GHC.Generics.M1 g1_a2b9)))
      = Ghci3.P (GHC.Generics.unK1 g1_a2b9)


GHC.Generics representation types:
  type GHC.Generics.Rep1 Ghci3.P = GHC.Generics.D1
                                     ('GHC.Generics.MetaData
                                        "P" "Ghci3" "interactive" 'GHC.Types.False)
                                     (GHC.Generics.C1
                                        ('GHC.Generics.MetaCons
                                           "P" 'GHC.Generics.PrefixI 'GHC.Types.False)
                                        (GHC.Generics.S1
                                           ('GHC.Generics.MetaSel
                                              'GHC.Base.Nothing
                                              'GHC.Generics.NoSourceUnpackedness
                                              'GHC.Generics.NoSourceStrictness
                                              'GHC.Generics.DecidedLazy)
                                           (GHC.Generics.Rec0 (Data.Proxy.Proxy k_a2b7))))


λ> :kind! Rep1 P
Rep1 P :: * -> *
= D1
    ('MetaData "P" "Ghci3" "interactive" 'False)
    (C1
       ('MetaCons "P" 'PrefixI 'False)
       (S1
          ('MetaSel
             'Nothing 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy)
          (Rec0 (Proxy *))))

Notice how the generated Rep1 P instance mentions kind k_a2b7 when it should be *. This probably shouldn't typecheck, but it does...

So we really have two issues here:

  1. gen_Generic_binds doesn't do any type variable substitution in the generated Rep1 instance.
  2. A deriving Generic1 clause might instantiate more type variables than what would be desirable in a generated Generic1 instance.

Fixing (1) might be doable with some Type-related subtitution sorcery.

Fixing (2) requires figuring out when a substitution is "undesirable".

  • In #5939, Pedro decided to make an instantiation of a type variable in a Generic(1) instance was bad, even in the presence of -XStandaloneDeriving. Kinds were excluded from this discussion since, at the time, you couldn't put kinds at the type level.
  • Alternatively, we could be ultra-permissive and allow any type variables to be instantiated when deriving Generic(1), both standalone and using a deriving clause. This should be sound (I think) if (1) gets fixed.
  • You proposed a scheme wherein instantiating type/kind variables in a Generic(1) instance is OK with -XStandaloneDeriving, but not with just a deriving clause.

I'd be fine with any approach. But if we picked the first or last option, now that types and kinds are merged, we'd have to figure out a better distinction between what type variables are and aren't allowed to be instantiated in such an instance.

Your suggestion of only checking dependent type variables would only be feasible with the third option, I believe, because combining that check with the first option would cause this program to be erroneously accepted.

comment:8 Changed 16 months ago by goldfire

Cc: andres@… added

I can offer only an uninformed opinion on the design question, but that opinion is to be permissive (option 2). It sounds like you're saying it's possible and sound. It also seems that issue (1) needs to be fixed regardless. So why not be permissive?

I'm also include Andres on this, as I believe he cares about generics, too.

comment:9 Changed 16 months ago by RyanGlScott

Interesting, you'd be alright with the second option? I like that idea since it requires zero instantiation checks, but you did object to it earlier:

I claim that

data Proxy k (a :: k) = ProxyCon deriving Generic1

should fail outright. I don't think GHC should be in the business of inferring values for visible parameters like k, even when it could. Instead, it should be this:

deriving instance Generic1 (Proxy *)

Note that this applies to Functor as much as it does to Generic1.

...

But

data Proxy k (a :: k) deriving Functor

would try instance Functor (Proxy k), which is ill-kinded. GHC is surely clever enough to figure out that it should really do instance Functor (Proxy *), but I think it should refrain from doing this.

comment:10 Changed 16 months ago by goldfire

Ah, yes, good point.

Perhaps I take back that point, then. :)

Or, even better, we should ask others for advice. I'll do that shortly.

comment:11 Changed 16 months ago by simonpj

Implementation simplicity can also lead to programmer simplicity. The reverse is certainly often true: a torturous implementation can lead to a specification that is hard to explain to users.

I have not examined the details here, but I like th idea of "zero instantiation checks"!

comment:12 Changed 16 months ago by simonpj

Richard's consultation is this email thread.

Just to be clear to everyone else, we are discussing

   data P1   (a :: k) = MkP1 deriving Functor
   data P2 k (a :: k) = MkP2 deriving Functor

Here P2 has an explicit kind arg, which must appear in any use of P2; thus

   f :: P2 * Int -> Bool

The question before the house is whether to reject either or both 'deriving' clauses, on the grounds that both instantiate 'k'; and ask for a stand-alone deriving declaration instead. Or we could accept either or both, getting

  instance Functor (P1 (a :: *))
  instance Functor (P2 * (a ::*))

In principle we could say Yes/Yes, Yes/No, or No/No to the two cases.

As Richard points out, a 'deriving' clause attached to a 'data' decl infers some instance context. That context must be written explicitly in a standalone deriving declaration. For example:

  data Maybe a = Nothing | Just a deriving( Eq )

we get the derived instance

  instance Eq a => Eq (Maybe a ) where
    (==) x y = ...blah...

The Eq a context in this instance declaration is magically inferred from the form of the data type declaration. This inference process gets pretty tricky for Functor and Traversable. To use the instance declarations you have to understand what the inferred instance context is; GHC should really provide a way to tell you.

Richard points out (later in the thread) that "instantiating k" is like adding a constraint k ~ * to the instance, thus

  instance (k ~ *) => Functor (P1 (a :: k))

That's not quite true, because this instance will match for any k, and hence overlaps with putative instances for k's other than *; whereas

  instance Functor P1 (a :: *)

matches only for the * case. And that is a subtle distinction indeed!

Hmm. I am rather persuaded by Richard's argument. Proposal: just regard the kind constraints as extra inferred constraints, and hence generate

  instance (k ~ *) => Functor (P1 (a :: k))

Now the derived instance always has type variables in the head; but those type variables may be constrained by the context. I like that.

It's not quite what happens now, so there would be a little implementation work to do. It might quite possibly actually be simpler.

I'm going to dump this email into the ticket.

Simon

comment:13 Changed 16 months ago by goldfire

There is a small problem here:

instance (k ~ *) => Functor (P1 (a :: k))

doesn't type-check. The problem is that this requires "immediate" use of the kind equality. In other words, we would need to give a name to the kind equality in order to cast by it later in the same type. But, last spring, Simon and I decided it would be simpler not to allow binding of kind equalities in types -- we can now bind them only in terms. (See also the first bullet here.)

Ignoring this issue, let's consider

  1. instance (k ~ *) => Functor (P2 k)
  2. instance Functor (P2 *)

I claim that these (if the first were to type-check) instances are entirely equivalent. Normally, inlining an equality constraint changes the meaning of an instance, but not here. The first instance matches Functor (P2 k) for any value of k. What's interesting is that the second instance, exactly as written, does too. That's because the only possible value for k in Functor (P2 k) is *. Anything else is surely ill-typed. Accordingly, we should be comfortable just using the second instance. And I believe that anytime this sort of instantiation would happen has this property (that anything else would be ill-typed).

This all means we can just do nothing about this particular issue. Which is nice.

comment:14 in reply to:  13 Changed 16 months ago by RyanGlScott

Replying to goldfire:

This all means we can just do nothing about this particular issue. Which is nice.

That is nice!

What then, should we do about this program? (from here)

{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE TypeInType #-}
module Cat where

import Data.Kind

class Cat k (cat :: k -> k -> *) where
  catId   :: cat a a
  catComp :: cat b c -> cat a b -> cat a c

instance Cat * (->) where
  catId   = id
  catComp = (.)

newtype Fun a b = Fun (a -> b) deriving (Cat k)

Currently, this generates an instance of the form instance Cat * Fun (i.e., the same thing as if you had written deriving (Cat *). Would this be acceptable?

comment:15 Changed 16 months ago by goldfire

As I just wrote in the email thread: we can argue that GHC is inferring a k ~ * constraint on the Cat instance. With that understanding, we're not so poorly off here.

Just a reminder to all: we've strayed quite far from the original ticket, which concerns a bespoke instantiation check for Generic and Generic1. So my "do nothing" comment does not mean we can just close the ticket.

comment:16 Changed 16 months ago by RyanGlScott

OK, it sounds like we're converging on a consensus of being ultra-permissive in allowing GHC to unify visible type parameters when deriving instances.

With that in mind, here's what we still need to do to fix this ticket:

  1. When generating a Rep/Rep1 instance here, do the appropriate type variable substitutions depending on the types the user provides.
  2. Remove the broken instantiation check here.

comment:17 Changed 16 months ago by simonpj

Good points in comment:13, Richard.

I'm happy with what Ryan suggests in comment:16. But it would be good to have a substantial Note to explain the thinking, with a link to this ticket.

Simon

comment:18 Changed 16 months ago by RyanGlScott

Differential Rev(s): Phab:D2061
Status: newpatch

comment:19 Changed 16 months ago by bgamari

Status: patchmerge

comment:20 Changed 16 months ago by bgamari

Milestone: 8.0.1

comment:21 Changed 16 months ago by Ben Gamari <ben@…>

In 7443e5c8/ghc:

Remove the instantiation check when deriving Generic(1)

Previously, deriving `Generic(1)` bailed out when attempting to
instantiate visible type parameters (#5939), but this instantiation
check was quite fragile and doesn't interact well with `-XTypeInType`.
It has been decided that `Generic(1)` shouldn't be subjected to this
check anyway, so it has been removed, and `gen_Generic_binds`'s
machinery has been updated to substitute the type variables in a
generated `Rep`/`Rep1` instance with the user-supplied type arguments.

In addition, this also refactors `Condition` in `TcDeriv` a bit. Namely,
since we no longer need `tc_args` to check any conditions, the `[Type]`
component of `Condition` has been removed.

Fixes #11732.

Test Plan: ./validate

Reviewers: goldfire, kosmikus, simonpj, bgamari, austin

Reviewed By: simonpj, bgamari

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D2061

GHC Trac Issues: #5939, #11732

comment:22 Changed 16 months ago by bgamari

Resolution: fixed
Status: mergeclosed
Note: See TracTickets for help on using tickets.