Opened 14 months ago

Last modified 4 weeks ago

#12088 new bug

Type/data family instances in kind checking

Reported by: alexvieth Owned by:
Priority: high Milestone: 8.4.1
Component: Compiler (Type checker) Version: 8.1
Keywords: TypeInType Cc: goldfire, int-index, RyanGlScott
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: GHC rejects valid program Test Case:
Blocked By: Blocking:
Related Tickets: #11348, #12239, #12643, #13790 Differential Rev(s): Phab:D2272
Wiki Page:

Description

In TcEnv.hs there is a note AFamDataCon: not promoting data family constructors. It states that we can't use a promoted data family instance constructor because we would have to interleave the checking of instances and data types. But with the fix of #11348, we now do exactly this. In the example from the note

data family T a
data instance T Int = MkT
data Proxy (a :: k)
data S = MkS (Proxy 'MkT)

-ddump-rn-trace shows these groups

rnTycl dependency analysis made groups
 [[data family T a_apG]
 []
 [data instance T Int = MkT],
 [data Proxy (a_apF :: k_apE)]
 []
 [],
 [data S = MkS (Proxy MkT)]
 []
 []]

That's to say, the instance T Int will in fact be checked before S. So let's remove this restriction.

Change History (41)

comment:1 Changed 14 months ago by goldfire

Keywords: TypeInType added

Looks reasonable to me.

comment:2 Changed 14 months ago by simonpj

And me. Maybe someone can offer a patch?

comment:3 Changed 14 months ago by alexvieth

Maybe someone can offer a patch?

Here's my attempt.

comment:4 Changed 14 months ago by goldfire

Differential Rev(s): Phab:D2272

comment:5 Changed 13 months ago by simonpj

The more I look at the draft patch Phab:D2272 the more I think it's not quite right yet.

At the root of it, we currently have

data TyClGroup name  -- See Note [TyClGroups and dependency analysis]
  = TyClGroup { group_tyclds :: [LTyClDecl name]
              , group_roles  :: [LRoleAnnotDecl name]
              , group_instds :: [ [LInstDecl name] ]  }

and we typecheck groups one by one. But actually, as commentary on Phab shows, type declarations can depend (via promoted data constructors) on data instance declarations, and data instance declarations can depend on each other. The [[LInstdecl name]] is clearly a bit of a hack, becuase it is itself effectively a sequence of (non-recursive) SCCs.

Let's flatten it out: I think we want

data TyClGroup name  -- See Note [TyClGroups and dependency analysis]
  = TyClGroup { group_tyclds :: [LTyClDecl name]
              , group_roles  :: [LRoleAnnotDecl name] }
  | InstGroup (LInstDecl name)

Now a list of TyClGroup tells us the order. Instances can't be mutually recursive, so singletons are what we need.

How to construct the order? Alex suggests adding a potentially large number of extra edges, and thene doing standard SCC. But that seems a bit artifical, and I don't think it's easy to construct precisely the right extra edges. How about this?

  • A "node" of the dependency graph contains either a single TyClDecl or a single InstDecl.
  • Each node needs a "key"; which is fine for TyClDecls (use the Name) but not for InstDecls. Solution; simply number off the nodes 1,2,3.
  • Build a [TCINode], with Int keys, using Node comes from Digraph:
    type Node key payload = (payload, key, [key])  -- In Digraph
    
    type TCIKey  = Int
    type TCINode = Node TCIKey (Either TyClDecl InstDecl)
    
    To do this we'll need an auxiliary mapping from TyCon/DataCon names to TCIKey.
  • Do SCC analysis in the usual way, using stronglyConnCompFromEdgedVerticesR

So far it's all as usual. But we want to make sure that the instance declarations occur as early as possible in the sequence, consistent with the dependencies. So:

  • Partition the InstDecl nodes from the TyClDecl nodes; there should be no InstDecl in a CyclicSCC. Or at least if there are we should reject the program. So we can cleanly separate the two.
  • Run down the list of TyClDecl SCCS. After each one, add all the InstDecl nodes that depend only on keys that occur earlier in the sequence. We may need to iterate this process. The key function might look like
    addInstDecls :: Set TCIKey                    -- TCIKeys that occur earlier
                 -> [SCC (Node TCIKey TyClDecl)]  -- TyClDecl SCCs
                 -> [Node TCIKey InstDecl]        -- InstDecls to add
                 -> [TyClGroup]                   -- Final groups in order
    
    addInstDecls _ [] ids
      = [InstDeclGroup id | (_, id, _) <- ids]
    
    addInstDecls so_far sccs ids
      | (dump_ids, keep_ids) <- pickInstDecls so_far ids
      , not (null dump_ids)  -- Drop some instance declarations here
      = dump_ids ++
        addInstDecls (so-far `add` keysOf sump_ids) sccs keep_ids
    
    addInstDecls so_far (scc : sccs) ids
      = scc : addInstDecls (so_far `add` keysOf scc) sccs ids
    
    The first case is easy. The second dumps any InstDecls that depend only on earlier declarations The third dumps the TyClDecl.

And that's about it. There is a bit of potential inefficienty in the repeated traversals by pickInstDecls but we don't expect to see a lot of instance declarations anyway, and if that becomes a problem we can think about a more efficient data structure than a plain list.

Does that sound reasonable? I like that it's very comprehensible!

Minor alternative. Instead of the existing TyClGroup (constructed by the renamer, consumed by the type checker), use SCC TyClNode, where

data TyClNode name
  = TyClDeclNode (LTyClDecl name)
                 (Maybe (LRoleAnnots name))
  | InstDeclNode (LInstDecl name)

Now we can use TyClNode instead of Either TyClDecl InstDecl in the above, which is nice.

comment:6 Changed 13 months ago by alexvieth

Alex suggests adding a potentially large number of extra edges, and thene doing standard SCC. But that seems a bit artifical, and I don't think it's easy to construct precisely the right extra edges.

Yes, there could be a lot of extra edges: at most the number of InstDecls multiplied by the number of TyClDecls. Computing them also requires a dfs for every TyClDecl. Compiler performance isn't so great even without this change, so we should be careful.

I'm fairly certain that these are precisely the right extra edges.

  • If a > b in the original graph, then a > b in the augmented graph, because edges are never removed.
  • The SCCs of the original are the same as in the augmented graph: an edge is added from a to b only if there's no path from b to a (on the graph including the other artificial edges).

Instances can't be mutually recursive, so singletons are what we need.

Do you mean that if an instance appears in a cyclic group, the program will definitely be rejected? If this is true, then your approach is viable, but we need to get the right error message. Perhaps the simplest way to do that is to continue using the existing mechanism brought about by tcTyClGroup (which my patch does), but this would demand that we give the InstDecls in their cyclic groups should they arise.

comment:7 Changed 13 months ago by simonpj

Yes, I did not fully understand the algorithm for adding extra edges. I think the specification is this:

  • We need an edge that makes a TyClDecl D depend on every InstDecl that does not depend (transitively) on D.

It took me a few mins to write this down, and it does not look convenient to compute.

If you are happy with the approach I described, can we try that?

Do you mean that if an instance appears in a cyclic group, the program will definitely be rejected?

Yes; a data instance can only depend on itself via promoted constructors, which we rule out for data type decls and should do for data instances too:

data instance T Int = MkT (forall (a::Proxy 'MkT). ...blah...)

As you say, it need a decent error message.

comment:8 Changed 12 months ago by alexvieth

I've spent some more time studying this and now I'm even more convinced that we should stay with my solution. Take a look at these modules:

-- A.hs
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE UndecidableInstances #-}

module A where

import Data.Kind

type family Closed (t :: Type) :: Type where
  Closed t = Open t

type family Open (t :: Type) :: Type
-- B.hs
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeInType #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE GADTs #-}

module B where

import Data.Kind
import A

data Q

data family F (t :: Type) :: Closed t -> Type

type instance Open Q = Bool

data instance F Q r where
  F0 :: F Q 'True

The point is that type instance Open Q must be checked before data instance F Q r, even though we can't see this dependency, as it's hidden in A.hs.

With Simon's suggestion, dependency analysis of B gives [data Q, data family F, type instance Open Q, data instance F Q r] and then it's reordered to get [data Q, type instance Open Q, data family F, data instance F Q r] (insert type instance Open Q as early as possible, then do the same for data instance F Q r). It's all good! But if we change the order of the definitions in the source file, we get a different result!

-- B.hs
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeInType #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE GADTs #-}

module B where

import Data.Kind
import A

data family F (t :: Type) :: Closed t -> Type

data Q

type instance Open Q = Bool

data instance F Q r where
  F0 :: F Q 'True

Here we get [data family F t, data Q, type instance Open Q, data instance F Q r], and then we reorder it to [data family F t, data Q, type instance F Q r, type instance Open Q] (first put type instance Open Q as soon as possible, then put data instance F Q r as soon as possible, which is right after data Q). Checking type instance F Q r causes a failure because we don't have type instance Open Q yet.

The issue: this algorithm doesn't discover that type instance Open Q can go before data family F t. Whether it fails as above depends upon the order in which we choose to merge instance declarations into the TyClDecl list, but in any case there will be a way to make it fail based on the order of declarations.

Taking some TyClDecl order and then inserting the InstDecls without ever re-ordering the TyClDecls will not work. A more complicated algorithm is needed. The one I've implemented doesn't have this problem, thanks to the artificial dependency of data family F t on type instance Open Q.

I should also add that I don't think TyClGroup should be changed to have only singleton InstDecl groups. I'd say it's good that this type represents the actual dependency structure, regardless of whether it's valid according to the type checker. We'll let the type checker programs determine that after the groups are made, but having the renamer do this would be overextending the responsibilities of the renamer.

comment:9 Changed 12 months ago by simonpj

Cc: goldfire added

Sorry to be slow... I'm on holiday.

You give a great example. The key point is that to kind-check the data instance F Q r declaration, we need to have completely processed the type instance Open Q declaration. So even instance declarations can have dependencies!

I'd like Richard E's opinion on all this; hence cc'ing goldfire.

To me your example raises the question of whether it's even possible to figure out the dependencies. You probably believe that your "add extra edges" does so. But I still don't know exactly what those edges are. Maybe you could take a comment in this ticket to explain your algorithm precisely?

Meanwhile, here's a new hypothesis. The only instances we must have in hand to do kind-checking are type instances, because they give rise to implicit nominal type/kind equalities (e.g. F Int ~ Bool) that the kind checker must solve.

But data instances are different: they do not give rise to implicit nominal type/kind equalities. So in that way they behave more like data type declarations. Indeed, I think that the only way that type/class declaration T can depend on a data instance declaration D is if T mentions one of D's promoted data constructors. This will be sorted out by the ordinary SCC analysis.

So perhaps this small modification to my algorithm will work: in the step "Partition the InstDecl nodes from the TyClDecl nodes", instead partition the type instance decls from the TyClDecl and data instance decls. Then your example will work just fine, because the type instance Open Q will be dealt with as soon as data Q is done.

The worry in my mind is whether you could have two type instance declarations, D1 and D2, where it was essential to process D1 before D2, or vice versa. I can't come up with an example, but neither can I see how to prove that there is no such example.

comment:10 Changed 12 months ago by alexvieth

I'd like Richard E's opinion on all this

Me too. We also need to think about how a solution for this ticket would hold up under an implementation of #11962.

To me your example raises the question of whether it's even possible to figure out the dependencies. You probably believe that your "add extra edges" does so. But I still don't know exactly what those edges are. Maybe you could take a comment in this ticket to explain your algorithm precisely?

After a bit of thought I came up with a simple change to my last example:

-- A.hs
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE TypeInType #-}

module A where

import Data.Kind

type family Open (t :: Type) :: Type

data family F (t :: Type) :: Open t -> Type
-- B.hs
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeInType #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE GADTs #-}

module B where

import Data.Kind
import A

data Q

type instance Open Q = Bool

data instance F Q r where
  F0 :: F Q 'True

The difference is that the data family is defined in A.hs and the family Closed is no longer necessary, as the data family is enough to obscure the Open dependency. B.hs as given above will not pass with my approach, but swap the two instance declarations and it's fine.

So perhaps this small modification to my algorithm will work ...

The modification sounds good, but I share your worry and I think we're actually in for a bit more thinking.

{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE TypeInType #-}
{-# LANGUAGE UndecidableInstances #-}

module A2 where

import Data.Kind

type family Open1 (t :: Type) :: Type

type family Open2 (t :: Type) (q :: Open1 t) :: Type
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeInType #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE GADTs #-}

module B where

import Data.Kind
import A2

type instance Open1 () = Bool
type instance Open2 () 'True = Char

The instance Open1 () needs to come before Open2 () 'True to get that Open1 () ~ Bool, but how are we to know this? Maybe we always check smaller instances (fewer parameters) before bigger ones?

Sorry to be slow... I'm on holiday.

No problem, enjoy it!

comment:11 Changed 12 months ago by simonpj

Excellent example!

It seems clear that there is going to be no reasonable "always-right" solution to this.

Moreover, here is another horrible example

data T1 = MkT1 (...something requiring IA...)
data T2 = MkT2 (...something requiring IB...)
type instance F1 Int = ...T2...    -- IA
type instance F2 Int = ...T1...    -- IB

Here F1 and F2 are supposed to be like Open1 and Open2 in your example; they must be processed in the right order.

But T1 and T2 appear to be entirely independent of each other, so SCC analysis will put them in some arbitray order; and that order in turn determines which of IA and IB can be processed because IA mentions T2 and IB mentions T1.

So only one ordering of the apparently un-ordered T1, T2 will work.

Can you turn that sketch into a concrete example? It's even more horrible than yours because the invisible ordering of instances seems to force an equally-invisible ordering of the data declarations too. Sigh.

I'm not sure what to do.

The only simple, predictable thing I can think of is this:

  • Perform SCC analysis on each group of data/type/class/data-instance decls that lie between successive type instance decls.
  • Otherwise process the file in the order writtten, from top to bottom.

That is, divide the file into segments, delimited by type instance decls. Process the file top-to-bottom, one segment at a time, processing the type instance decls in the order they are written.

That's simple, easy to specify, and easy to implement. Where there are no type instance decls, we get full SCC analysis across the entire module. The big disadvantage is that where you have a forward reference, kind checking will just fail:

data T = MkT S  -- Forward reference
type instance F Int = Bool
data S = MkS Int

That is pretty bad. Can you think of an alternative rule? The goal is NOT to be as clever as possible. THe goal is to be as stupid, straightforward, and predictable as possible, even if that might mean a little manual labour on the part of the programmer.

comment:12 Changed 12 months ago by simonpj

Some thoughts, after discussion with my son Michael.

  • This entire problem arises only because of open type families; I think closed type families are fine. We could simply ban the use of open type families in the kind signature of an open type family. Or possibly all type families, I'm not sure.
  • Another idea: when we have a group of type instance decls, suppose we kind check them, but do not yet solve the kind equalities that arise; but add them to the instance environment anyway. Now solve all the kind equalities. In our example, we'd add both instances for Open1 and Open2 before trying to solve their kind equalities. Seems a bit scary somehow.
  • Question: how do dependently typed languages deal with this? For example, what if typechecking the defintion of a function f requires the type checker to evaluate calls to f? In Haskell-land you could imagine
    type family Id (a :: *) :: Id *
    
    We reject this at the moment, but the idea is to get into a situation where, before having typechecked f we need to evaluate f. That's rather like the situation with Open1/2.

comment:13 Changed 12 months ago by goldfire

I have a very different way of thinking about all of this that may shed some light on the matter.

(Before we get any further, if you'd like my input, do email directly. It's only dumb luck that I happened to catch up on this ticket so soon after requesting my feedback.)

Most declarations declare actually two things: a type and a definition. For data T = MkT Int, we have the type T :: Type and the definition, consisting of MkT. For type family F a where F Bool = Int, we get the type F :: Type -> Type and the definition F Bool = Int.

What's odd about open families is that the type and definition are separated, quite clearly in the code. But when we think about the fact that closed definitions are declaring two things at once, perhaps it's the closed definitions that are odd and the open ones are more natural. (Indeed, at the term level, we make the type and body separate top-level declarations.)

So, perhaps that's the answer: treat closed declarations as two separate pieces, each with its own node in the dependency graph. Now, any declaration (where "declaration" means either a type declaration or a definition declaration, but not both wrapped together) can depend on any other (also separated) declarations.

(Let's hold off a moment on worrying about how to implement this. I'll get to that later. The goal here is simply to describe what programs are accepted.)

Let's review the examples from above:


From the original description:

data family T a
data instance T Int = MkT
data Proxy (a :: k)
data S = MkS (Proxy 'MkT)

Here are the dependencies, where I list a declaration and the things it depends on (not using transitivity):

  • T (type only): nothing
  • T Int (defn only): type of T
  • type of Proxy: nothing
  • defn of Proxy: type of Proxy
  • type of S: nothing
  • defn of S: type of MkT (= defn of T Int)

In the last bullet, we see that the type of a data constructor is considered in the same declaration as the definition of the parent tycon. The definition of a constructor is also in this same declaration. (This suggests further opportunities for splitting, though. Could one constructor mention another constructor, promoted, from the same type? I think so.)

These bullets also show that all definitions automatically depend on their own types. No surprise there.


From comment:10:

-- module A

type family Open (t :: Type) :: Type

data family F (t :: Type) :: Open t -> Type
-- module B
import A

data Q

type instance Open Q = Bool

data instance F Q r where
  F0 :: F Q 'True
  • Open (type only): nothing
  • F (type only): type of Open
  • type of Q: nothing
  • defn of Q: type of Q
  • Open Q (defn only): type of Open, type of Q
  • F Q r (defn only): type of F, type of Q, defn of Open

The last bullet above reveals something new: when the type of A depends on the type of B, then the definition of A depends on the definition of B. In this case, the type of F depends on the type of Open, and thus the definition of F (that is, instance F Q r) depends on the definition of Open.

We now have a small conundrum: what on earth is the definition of Open, an open type family? The only possible answer: all instances that are in scope. These instances are considered as an inseparable clump. Any attempt to break them up by bizarre mutual dependencies is met with failure. I do think there is room to be cleverer here, but I think we'd be too clever by half if we tried.

In this example, it means that we check the Open Q instance before the F Q r instance.


Also from comment:10:

-- module A2

type family Open1 (t :: Type) :: Type

type family Open2 (t :: Type) (q :: Open1 t) :: Type
-- module B2
import A2

type instance Open1 () = Bool
type instance Open2 () 'True = Char
  • Open1 (type only): nothing
  • Open2 (type only): type of Open1
  • Open1 () (defn only): type of Open1
  • Open2 () (defn only): type of Open2, defn of Open1

It's now clear that we must check the instance for Open1 before that of Open2. No difficulty here.


From comment:11:

data T1 = MkT1 (...something requiring IA...)
data T2 = MkT2 (...something requiring IB...)
type instance F1 Int = ...T2...    -- IA
type instance F2 Int = ...T1...    -- IB
  • type of T1: nothing
  • defn of T1: defn of F1
  • type of T2: nothing
  • defn of T2: defn of F2
  • F1 Int (defn only): type of F1, type of T2
  • F2 Int (defn only): type of F2, type of T1

No problem here either. Go in this order: type of T1, type of T2, type of F1, type of F2, F1 Int, F2 Int, defn of T1, defn of T2.

The reason there is no problem is that we have separated types from definitions.


Reactions to comment:12:

  • It's hard to bad open type families from appearing somewhere, because we'd have to make the ban transitive. We'd thus have two classes of closed type families: those that mention open families somewhere (transitively) and those that don't. This would be awful.
  • Your "another idea" might be something like what I've proposed here.
  • In answer to your "question": I think only Agda handles this. It's induction recursion. And I think it's rather like what I've sketched here.

Implementation notes

Checking types and definitions separately is a bit of a bother for datatypes, because the datacons and the tycons mutually depend on each other. But I think it's possible -- just a little knotty. Let's assume that the dependency analysis succeeds and that there is a dependency ordering with no (perhaps transitive) self-dependency.

  1. At the beginning of the "type-checking" (ahem, desugaring) pass in TcTyClsDecls, go through the declarations in reverse order and create a knot for each definition (with the list of datacons knot-tied) and each type (with the tycon knot-tied).
  1. Then check the declarations in order.
  1. When a type declaration is processed, close out the tycon's knot. You will now have a tycon usable in every way except to access its datacons.
  1. When a definition declaration is processed, close out its knot. The datacons will now be untied and usable.

Proposition. If the dependency analysis is correct, the above algorithm accesses nothing that is knot-tied.

I propose naming this algorithm the "Celestial Dance of the Black Holes". It's as dangerous as it sounds. But I believe it's implementable. Perhaps not by humans. :)

The alternative to the Celestial Dance is to create intermediate structures with mutable references and such and then zonk them to get the desired outcome. The zonker will have to be creative about knot-tying, but that's much simpler. Indeed this may be the better approach, perhaps leaning more heavily on the rather new TcTyCon than we have been.

comment:14 Changed 12 months ago by simonpj

I don't really understand your suggestion Richard.

Firstly, we often infer the kind of a data type, so it's hard to separate dependence on its kind from dependendence on its definition. I din't undersand the precise dependency analysis that you informally propose.

Secondly, I think that open type families are indeed the odd one out, as you acknowledge in the "odd conundrum". I don't see what's gained by separating types from defns even if we could.

Thirdly, you suggest kind-checking all equations for a type family together as a single group. But you can't do that if they mention data types that have not yet been defined. Yet we can't delay too long ... getting those instances into play early is the whole point.

Fourth, the dependence may be indirect. In the example, Open2 mentions Open1 directly. But it could instead mention F directly, where F is an open family that has an equation like type instance F [t] = Open1 t. Now the dependence on Open1 is hidden.

comment:15 in reply to:  14 Changed 12 months ago by goldfire

Replying to simonpj:

I don't really understand your suggestion Richard.

Firstly, we often infer the kind of a data type, so it's hard to separate dependence on its kind from dependendence on its definition. I din't undersand the precise dependency analysis that you informally propose.

Yes, of course. But I think this is easily dispatched: if we are inferring the kind, then use this rule:

  • If a datatype's definition depends on the definition of X, then its type depends on the type of X.

I haven't dwelt on this very long, but that seems to capture the nature of types that can be inferred.

Here is something toward a specification of my idea:

  1. In a datatype definition for T:
    1. Let X be a tycon mentioned in T's kind signature or in kind signatures of type variables to T. Then T's type depends on X's type and T's definition depends on X's definition.
    2. Let K be a datacon mentioned in T's kind signature or in kind signatures of type variables to T. Let S be the tycon containing K. (S might be a data family instance tycon, too.) Then T's type depends on the definition of S.
    3. Let X be a tycon mentioned in the type signature of a datacon in T. Then the definition of T depends on the type of X. In addition, if T does not have CUSK, then the type of T depends on the type of X.
    4. Let K be a datacon (belonging to tycon S) mentioned in the type signature of a datacon in T. Then the definition of T depends on the definition of S. In addition, if T does not have a CUSK, then the type of T depends on the type of S. (If S is a data family instance tycon, then this last sentence refers to the family for S, not the instance tycon, which has no type declaration to depend on.)
  1. In a closed type family definition for C:
    1. Let X be a tycon mentioned in any kind signature in the head of C. Then C's type depends on X's type and C's definition depends on X's definition.
    2. Let K be a datacon (belonging to tycon S) mentioned in any kind signature in the head of C. Then C's type depends on X's definition.
    3. Let X be a tycon mentioned in any equation for C. Then C's definition depends on X's type. In addition, if C does not have a CUSK, then C's type depends on Xs type.
    4. Let K be a datacon (belonging to tycon S) mentioned in any equation for C. Then C's definition depends on S's definition. In addition, if C does not have a CUSK, then C's type depends on S's type. (Or S's family's type, as above.)
  1. In the declaration of an open type family F:
    1. Let X be a tycon mentioned in any kind signature in the declaration for F. Then F's type depends on X's type and F's instance block depends on X's definition.
    2. Let K be a datacon (belonging to tycon S) mentioned in the declaration for F. then F's type depends on S's type and F's instance block depends on S's definition.
  1. In a type family instance for F:
    1. Let X by a tycon mentioned in the instance. Then F's instance block depends on X's type.
    2. Let K be a datacon (belonging to tycon S) mentioned in the instance. Then F's instance block depends on S's definition.
  1. In the declaration of an open data family D:
    1. Let X be a tycon mentioned in any kind signature in the declaration for D. Then D's type depends on X's type.
    2. Let K be a datacon (belonging to tycon S) mentioned in the declaration for D. then D's type depends on S's type.
  1. In the declaration of a data instance: as for a regular datatype, where the kind signatures are inherited from the data family declaration.

Left for another time: classes.

Question raised: Data families always have a CUSK because a data family is always open. But might a data family instance not have a CUSK? What happens then? Perhaps we need to have CUSK analysis on data family instances, too.

Secondly, I think that open type families are indeed the odd one out, as you acknowledge in the "odd conundrum". I don't see what's gained by separating types from defns even if we could.

We gain induction recursion, #11962. Even before this whole discussion, I had the general idea of this separation. See this wiki page, written some time ago, on this idea.

Thirdly, you suggest kind-checking all equations for a type family together as a single group. But you can't do that if they mention data types that have not yet been defined. Yet we can't delay too long ... getting those instances into play early is the whole point.

It's true that my approach will sometimes fail when a finer-grained approach would succeed. But I don't think we're trying to get every use case. We're trying to get a predictable approach that works just about all the time. Here is an example that would fail:

type family F a = r | r -> a
data T :: F a -> Type where
  MkT :: T False
type instance F Int = Bool
type instance F Char = Proxy MkT

The definition of T depends on the first instance while the second instance depends on the definition of T. I'm not bothered rejecting this.

Fourth, the dependence may be indirect. In the example, Open2 mentions Open1 directly. But it could instead mention F directly, where F is an open family that has an equation like type instance F [t] = Open1 t. Now the dependence on Open1 is hidden.

Isn't this just transitivity? Depending on the "definition" of a type family means dependency on the block of instances -- all of them. So anything an instance depends on is transitively reachable.

comment:16 Changed 12 months ago by alexvieth

I'm still trying to grok the latest comments but in the meantime I want to offer an idea similar to this one:

Another idea: when we have a group of type instance decls, suppose we kind check them, but do not yet solve the kind equalities that arise; but add them to the instance environment anyway. Now solve all the kind equalities. In our example, we'd add both instances for Open1 and Open2 before trying to solve their kind equalities. Seems a bit scary somehow.

This doesn't seem scary to me. What if we don't even kind check them before adding their equalities to the environment? Just assume they're well-kinded while checking the other declarations. This way, we don't have to make sure the open type family instances come as early as possible in the order.

  1. Do SCC analysis on all declarations except for open type family instances, to give ordered TyClGroups and the set of those instances (in source file order).
  2. Before checking the TyClGroups, put all equalities arising from the open type family instances into the environment (assume for now that they're well-kinded).
  3. Check the TyClGroups in order as usual.
  4. Check the open type family instances (verifying or falsifying the assumption in 2).

Maybe this is nonsense. Would it be possible to make and use a coercion axiom without kind checking the lhs and rhs types?

comment:17 Changed 11 months ago by int-index

Milestone: 8.2.1
Priority: normalhigh
Summary: Promote data family instance constructorsPromote type/data family instance constructors
Type: feature requestbug

So it turns out that https://ghc.haskell.org/trac/ghc/ticket/12239 is a manifestation of the same issue. Copying the example from it for completeness:

{-# LANGUAGE TypeInType, GADTs, TypeFamilies #-}

import Data.Kind (Type)

data N = Z | S N

data Fin :: N -> Type where
  FZ :: Fin (S n)
  FS :: Fin n -> Fin (S n)

type family FieldCount (t :: Type) :: N

type family FieldType (t :: Type) (i :: Fin (FieldCount t)) :: Type

data T

type instance FieldCount T = S (S (S Z))

type instance FieldType T FZ = Int
type instance FieldType T (FS FZ) = Bool
type instance FieldType T (FS (FS FZ)) = String

Reordering the declarations slightly makes it compile:

{-# LANGUAGE TypeInType, GADTs, TypeFamilies #-}

import Data.Kind (Type)

data N = Z | S N

data Fin :: N -> Type where
  FZ :: Fin (S n)
  FS :: Fin n -> Fin (S n)

type family FieldType (t :: Type) (i :: Fin (FieldCount t)) :: Type

data T

-- Note that 'FieldCount' is now declared after 'FieldType' and 'T'.
type family FieldCount (t :: Type) :: N

type instance FieldCount T = S (S (S Z))

type instance FieldType T FZ = Int
type instance FieldType T (FS FZ) = Bool
type instance FieldType T (FS (FS FZ)) = String

Hope it helps.

comment:18 Changed 11 months ago by int-index

Cc: int-index added
Summary: Promote type/data family instance constructorsType/data family instances in kind checking

comment:19 Changed 11 months ago by int-index

Another example, where a newtype complicates things ever further. I tried adding TH sections to enforce an order in which type family instances are added but it didn't help:

{-# LANGUAGE TemplateHaskell, TypeInType, GADTs, TypeFamilies #-}
{-# OPTIONS -Wno-unticked-promoted-constructors #-}

import Data.Kind (Type)

data N = Z | S N

data Fin :: N -> Type where
  FZ :: Fin (S n)
  FS :: Fin n -> Fin (S n)

type family FieldCount (t :: Type) :: N

type family FieldType (t :: Type) (i :: Fin (FieldCount t)) :: Type

newtype Field (t :: Type) (i :: Fin (FieldCount t)) = Field (FieldType t i)

return [] -- split TH section

data L

type instance FieldCount L = S Z

return [] -- split TH section

type instance FieldType L FZ = ()

return [] -- split TH section

-- newtype not involved, compiles.
l_fz :: FieldType L FZ
l_fz = ()

-- newtype involved, doesn't compile.
field_l_fz  :: Field L FZ
field_l_fz = Field l_fz

The error I'm getting is:

field.hs:36:20: error:
    • Couldn't match type ‘FieldType L 'FZ’ with ‘()’
    • In the first argument of ‘Field’, namely ‘l_fz’
      In the expression: Field l_fz
      In an equation for ‘field_l_fz’: field_l_fz = Field l_fz
Last edited 11 months ago by int-index (previous) (diff)

comment:20 in reply to:  16 Changed 11 months ago by goldfire

Replying to alexvieth:

  1. Do SCC analysis on all declarations except for open type family instances, to give ordered TyClGroups and the set of those instances (in source file order).
  2. Before checking the TyClGroups, put all equalities arising from the open type family instances into the environment (assume for now that they're well-kinded).
  3. Check the TyClGroups in order as usual.
  4. Check the open type family instances (verifying or falsifying the assumption in 2).

Maybe this is nonsense. Would it be possible to make and use a coercion axiom without kind checking the lhs and rhs types?

I'm not sure I understand. Currently, kind-checking an instance and desugaring it are all done at the same time, so step (2) is hard to implement. Let's say we get around that barrier, though. This would then put ill-kinded equalities into the context. I think it would be awfully hard to avoid getting GHC to loop or do other silly things with a bad context. Perhaps you could squeeze this through, but I'm very skeptical.

Or am I misunderstanding something?

comment:21 Changed 11 months ago by goldfire

I'm not convinced that comment:17 is the same bug here. That, to me, looks like a bug in the resolution to #11348. Doing the major rewrite I propose in comment:13 and comment:15 may likely help, but there's a good chance comment:17 could be fixed with much less effort.

Alex, you're the expert here. What do you think?

comment:22 Changed 11 months ago by mpickering

I really think this exactly the same problem. The crux of this ticket is

(type/data) instances can depend on other (type/data) instances so we have to get the order right.

#11348 goes some of the way to get the right order but doesn't account for dependencies between (type/data) instances.

I am trying to write up these discussions on a wiki page as it is very messy currently.

comment:23 Changed 11 months ago by goldfire

But comment:17 has no data instances. There are only type instances there, hence my confusion.

comment:24 Changed 11 months ago by mpickering

@Richard, it seems to me the crux of the problem is that type instances introduce equalities so they must be interleaved with the other declarations. Data instances are just one place where these extra equalities can be used, another being kind checking for type equalities as in comment:17.

Here is the wiki page I started. It is not complete but I am too tired to carry on tonight. Perhaps you could look Alex and fill in some of the details?

https://ghc.haskell.org/trac/ghc/wiki/InterleavingTypeInstanceChecking

comment:25 Changed 11 months ago by alexvieth

Sorry for the delay, new contract is keeping my occupied. comment:17 certainly shows a shortcoming in the patch for #11348. No sense reopening it though, this ticket can supersede it (its summary could use a revision for scope). Thanks for the wiki writeup mpickering. It's accurate and I have nothing to add right now. That section called "The Solution" is a tall order!

Replying to goldfire:

This would then put ill-kinded equalities into the context. I think it would be awfully hard to avoid getting GHC to loop or do other silly things with a bad context. Perhaps you could squeeze this through, but I'm very skeptical.

Or am I misunderstanding something?

You're probably not misunderstanding. I'm just sounding off ideas and I'm no expert on GHC's type checker. Putting ill-kinded equalities into the context sounds irresponsible yes, but they'll all be of the form F k ~ l for some open type family F (or similar for higher arities). Does this fact help at all? Could you come up with a case where an ill-kinded equality of this form would wreak havoc?

How about a variation on that suggestion of mine? A kind of lazy evaluation for open type families in kinds. While type checking other declarations, rather than assuming the (possibly ill-kinded) equalities arising from open type families to be true, defer them until after all declarations are checked. Then we end up with a set of equalities featuring open type family constructors which must be solved using all open type family instances. So in the FieldCount example from comment:17, repeated here for convenience:

{-# LANGUAGE TypeInType, GADTs, TypeFamilies #-}

import Data.Kind (Type)

data N = Z | S N

data Fin :: N -> Type where
  FZ :: Fin (S n)
  FS :: Fin n -> Fin (S n)

type family FieldCount (t :: Type) :: N

type family FieldType (t :: Type) (i :: Fin (FieldCount t)) :: Type

data T

type instance FieldCount T = S (S (S Z))

type instance FieldType T FZ = Int
type instance FieldType T (FS FZ) = Bool
type instance FieldType T (FS (FS FZ)) = String

The only declarations which give rise to a deferred kind equality are the FieldType instances.

  • type instance FieldType T FZ = Int yields S n0 ~ FieldCount T
  • type instance FieldType T (FS FZ) = Bool yields S (S n1) ~ FieldCount T
  • type instance FieldType T (FS (FS FZ)) = String yields S (S (S n2)) ~ FieldCount T

These don't stop type checking dead, they're just tucked away for later and type instance FieldCount T = S (S (S Z)) will eventually be checked (either before or after, we don't care) and available when those three equalities are solved. The program will therefore pass: n0 := (S (S Z)), n1 := S Z, n2 := Z.

comment:26 Changed 11 months ago by goldfire

I still am in favor of my comment:13 and comment:15. The other approaches seem a bit ad-hoc and perhaps are approximations of my proposal.

comment:27 Changed 11 months ago by goldfire

Perhaps that last comment was a bit strongly worded, upon re-reading. I do think comment:25 may be onto something -- waiting until later to solve equality constraints is something GHC does well -- but I don't have a clear enough specification of comment:25 in my head to really analyze.

comment:28 Changed 10 months ago by simonpj

I had a good conversation with Richard and Stepanie about this ticket.

Our conclusions.

  • Alex's cunning examples demonstrate that it's very difficult (perhaps impossible) to come up with a dependency analysis that always does the Right Thing. Even if it's possible we should not do it because (a) it'd be hard to implement and (b) it'd be extremely hard to explain exactly which programs should be OK.
  • Bottom line. Instead of getting in deeper, pull back. Do somthing that is simple to explain, predicatable, and simple to implement, even if it' a little less convenient.
  • Richard's long remarks in comment:15 actually relate to somethiug rather different called (I think) "induction recursion". It is cool but it should have a separate ticket. Richard can you do that?

Proposal

Get the programmer to do "phasing" explicitly if it is necessary. Specifically

  • Introduce a top-level binding separator, thus
    f x = x + 1
    
    ============
    
    g x = f x
    
    The ========== is the binding separator.
  • Concrete syntax to be decided. But we have this very thing already, thus
    f x = x+1
    
    $([d| |])   -- Empty Template Haskell top level splice
    
    g x = f x
    
    New concrete syntax is just to make it less wierd.
  • Semantics. Everything before the separator is renamed and typechecked before anything after.
  • When typechecking data/class/instance decls, we revert to something close to the situation before Alex's patch:
    1. class, type, data, and data instance declarations are SCCd and typechecked in order
    2. When that is complete typecheck class instance and type instance declarations. Note that data instance decls come in the first wave (addressing the example in the Description), but the slippery type instance declarations happen only afterwards
  • I think we could perhaps include closed type families in step (1); c.f. comment:12. Not certain.

In Alex's example in comment:8 the type instance F would happen last, which would fail (consistently so). To fix it, use a separator:

module B where

import Data.Kind
import A

data Q

data family F (t :: Type) :: Closed t -> Type

type instance Open Q = Bool

============
   -- The separator ensures that data instance F Q r is
   -- typechecked after the type instance Open Q

data instance F Q r where
  F0 :: F Q 'True

comment:29 Changed 10 months ago by int-index

What about the example in comment:19? Adding binding separators didn't fix the issue.

comment:30 Changed 10 months ago by simonpj

Here's another example of the phasing problem, this one posted by Dave Menendez:

{-# LANGUAGE TemplateHaskell, TypeInType, TypeFamilies #-}

module DH where

import Data.Kind (Type)

type family K t :: Type
type family T t :: K t -> Type

data List

type instance K List = Type

type instance T List = []   -- Error on this line
-- Error is:
--   *   Expected kind K List -> Type, but [] has kind `* -> *` 
--         In the type `[]`
--       In the type instance declaration for `T`

The reasson is the we need teh K List instance to typecheck the T List instance.

Adding a separator just before the instance T List declaration makes it work

comment:31 Changed 10 months ago by alexvieth

Replying to simonpj:

It is cool but it should have a separate ticket. Richard can you do that?

It's here: #11962

Simon, what do you think of my idea from comment 25?

Bottom line. Instead of getting in deeper, pull back. Do somthing that is simple to explain, predicatable, and simple to implement, even if it' a little less convenient.

The proposed phasing annotations might be convenient for ghc developers but they'd be inconvenient and surprising for ghc users. I appreciate that, at the term level, I don't need to worry about the order of my definitions. I'd like the same to be true at type level if possible, and I think it is possible. I'm just not sure if it will be simple to implement.

comment:32 Changed 10 months ago by simonpj

RE comment:25:

  • I have no idea how to actually implement this idea. Type constructors, axioms, etc have no late-bound holes in them and it would be architeturally difficult to change that, becuase they are all tied together so they point to each other in a graph. Difficult to update.
  • More seriously, I do not know whother the resulting system would be sound. The kinding of one instance depends on executing another instance; and vice versat. At at minimum I'd want to see a specification in langauge that a type-theory person could understand, and a sketch proof of soundness.
  • Stephanie Weirich and Ricahrd Eisenberg are world experts of this stuff. We sat down and ICFP and made zero progress. That doesn't mean that no progress is possible; but it does mean that it's premature to implement anything until we undersatnd the theory.
  • It's important to bre able to explain to programmers exactly what will kind-check and what will not. The fact that the implementation is already jolly tricky, and yet does not do the job, is a worry. Making it much more complicated still feels wrong to me.

Better to withdraw to something that is (a) simple to explain and (b) simple to implement. That leaves a clear research challenge for some research student to tackle.

I appreciate that, at the term level, I don't need to worry about the order of my definitions.

I agree. But I don't know anything better. If Ricahrd gets his way, thing will happen at the term level, when typing f requires executing g anf vice versa. Except it's worse at the type level becaues of open type families.

I specualte that you'll never need a phasing annotation if you only need closed type families. Maybe that's why this issue doesn't show up for Coq, Agda etc?

comment:33 Changed 10 months ago by goldfire

I'm still unaware of a concrete example that my comment:13 and comment:15 do not handle.

During the conversation with Simon and Stephanie at ICFP, we indeed agreed on the "separator" solution. Easy to implement, explain, and understand. Perhaps slightly annoying, though. However, this conversation took place without an ability to get on Trac due to local connectivity challenges, and I took Simon's word about the existence of such an example. But now I can't find one.

Perhaps comment:13 and comment:15 are too baroque to implement (I claim: it's no worse than lots of other stuff!) and to explain to/understand by users (this bit may be an effective argument against my idea). But I'd like us to understand why we don't do this idea before we give up and just choose the simple (but perhaps annoying) solution.

Note that comment:13/comment:15 handle the example in comment:19. Because FieldType's type depends on FieldCount's, then FieldType's equations depend on FieldCount's equations, too. Thus the dependency would be caught and the compiler can process the instances in the correct order.

comment:34 Changed 7 months ago by bgamari

Milestone: 8.2.18.4.1

This sounds like something that will not be happening for 8.2.

comment:35 Changed 6 months ago by simonpj

Smaller example

type family Open a

type instance Open Int = Bool


type family F a :: Open a

type instance F Int = True

comment:36 Changed 6 months ago by goldfire

The nub of my idea in comment:13 and comment:15 is this:

  • Track dependency on declarations independently from definitions.
  • When T's declaration depends on S's declaration, then T's definition depends on S's definition.
  • Dependency tracking is, as usual, transitive.

So in the smaller example in comment:35, F's declaration depends on Open's. So F's definition -- the instances -- depends on Open's. This means we type check Open's instances before F's. There's lots more detail (comment:15) but this is the basic idea.

Here is a case (due to Iavor) that would be rejected:

type family Open a
type instance Open Int = Bool
type instance Open Char = F FLoat
type instance Open Float = Type

type family F a :: Open a
type instance F Int = True
type instance F Char = [True]
type instance F Float = [Bool]

The instances for a given type family are checked as a clump, and these instances can be accepted only by interleaving.

comment:37 Changed 6 months ago by simonpj

Ah, now that makes more sense. To flesh it out slightly:

  • The declaration F:sig of a type family F is its kind signature, e.g.
    type family F a :: * -> *
    
  • The definition F:def of a type family F is the collection of all its type instances (in this module). e.g.
    type instance F Int = Maybe
    type instance F Bool = []
    
    For the purposes of this conversation we treat the instances for a single type family F as a single "clump"; they are always treated together.
  • Each vertex of the dependency graph is either
    • the declaration F:sig (i.e. kind signature) of a type family F, or
    • the definition F:def (i.e. type instances) of the type family.
  • If there is an edge in the graph from A to B, then A depends on B, and B must be type-checked before A.
  • The edges are as follows
    • An edge from F:def to F:sig
    • An edge from F:sig to G:sig, and from F:def to G:def, for any tycon G syntactically mentioned in F's declaration
    • An edge from F:def to G:def for any tycon G syntactically mentioned in G's definition.

Now do strongly connected component analsis and process in order.

Richard's magic is in the bold part of the second bullet.

We could do with a proof of some kind.

comment:38 Changed 5 months ago by simonpj

When fixed, look at #12643

comment:39 Changed 5 months ago by goldfire

Agreed with comment:37.

comment:40 Changed 7 weeks ago by simonpj

See also #13790

comment:41 Changed 4 weeks ago by RyanGlScott

Cc: RyanGlScott added
Note: See TracTickets for help on using tickets.