Opened 21 months ago
Last modified 6 months ago
#12088 new bug
Type/data family instances in kind checking
Reported by: | alexvieth | Owned by: | |
---|---|---|---|
Priority: | high | Milestone: | 8.4.1 |
Component: | Compiler (Type checker) | Version: | 8.1 |
Keywords: | TypeInType | Cc: | goldfire, int-index, RyanGlScott |
Operating System: | Unknown/Multiple | Architecture: | Unknown/Multiple |
Type of failure: | GHC rejects valid program | Test Case: | |
Blocked By: | Blocking: | ||
Related Tickets: | #11348, #12239, #12643, #13790 | Differential Rev(s): | Phab:D2272 |
Wiki Page: |
Description
In TcEnv.hs
there is a note AFamDataCon: not promoting data family constructors. It states that we can't use a promoted data family instance constructor because we would have to interleave the checking of instances and data types. But with the fix of #11348, we now do exactly this. In the example from the note
data family T a data instance T Int = MkT data Proxy (a :: k) data S = MkS (Proxy 'MkT)
-ddump-rn-trace shows these groups
rnTycl dependency analysis made groups [[data family T a_apG] [] [data instance T Int = MkT], [data Proxy (a_apF :: k_apE)] [] [], [data S = MkS (Proxy MkT)] [] []]
That's to say, the instance T Int
will in fact be checked before S
. So let's remove this restriction.
Change History (42)
comment:1 Changed 21 months ago by
Keywords: | TypeInType added |
---|
comment:4 Changed 21 months ago by
Differential Rev(s): | → Phab:D2272 |
---|
comment:5 Changed 21 months ago by
The more I look at the draft patch Phab:D2272 the more I think it's not quite right yet.
At the root of it, we currently have
data TyClGroup name -- See Note [TyClGroups and dependency analysis] = TyClGroup { group_tyclds :: [LTyClDecl name] , group_roles :: [LRoleAnnotDecl name] , group_instds :: [ [LInstDecl name] ] }
and we typecheck groups one by one. But actually, as commentary on Phab shows, type declarations can depend (via promoted data constructors) on data instance
declarations, and data instance
declarations can depend on each other. The [[LInstdecl name]]
is clearly a bit of a hack, becuase it is itself effectively a sequence of (non-recursive) SCCs.
Let's flatten it out: I think we want
data TyClGroup name -- See Note [TyClGroups and dependency analysis] = TyClGroup { group_tyclds :: [LTyClDecl name] , group_roles :: [LRoleAnnotDecl name] } | InstGroup (LInstDecl name)
Now a list of TyClGroup
tells us the order. Instances can't be mutually recursive, so singletons are what we need.
How to construct the order? Alex suggests adding a potentially large number of extra edges, and thene doing standard SCC. But that seems a bit artifical, and I don't think it's easy to construct precisely the right extra edges. How about this?
- A "node" of the dependency graph contains either a single
TyClDecl
or a singleInstDecl
.
- Each node needs a "key"; which is fine for
TyClDecls
(use theName
) but not forInstDecl
s. Solution; simply number off the nodes 1,2,3.
- Build a
[TCINode]
, withInt
keys, usingNode
comes fromDigraph
:type Node key payload = (payload, key, [key]) -- In Digraph type TCIKey = Int type TCINode = Node TCIKey (Either TyClDecl InstDecl)
To do this we'll need an auxiliary mapping fromTyCon
/DataCon
names toTCIKey
.
- Do SCC analysis in the usual way, using
stronglyConnCompFromEdgedVerticesR
So far it's all as usual. But we want to make sure that the instance declarations occur as early as possible in the sequence, consistent with the dependencies. So:
- Partition the
InstDecl
nodes from theTyClDecl
nodes; there should be noInstDecl
in aCyclicSCC
. Or at least if there are we should reject the program. So we can cleanly separate the two.
- Run down the list of
TyClDecl
SCCS. After each one, add all theInstDecl
nodes that depend only on keys that occur earlier in the sequence. We may need to iterate this process. The key function might look likeaddInstDecls :: Set TCIKey -- TCIKeys that occur earlier -> [SCC (Node TCIKey TyClDecl)] -- TyClDecl SCCs -> [Node TCIKey InstDecl] -- InstDecls to add -> [TyClGroup] -- Final groups in order addInstDecls _ [] ids = [InstDeclGroup id | (_, id, _) <- ids] addInstDecls so_far sccs ids | (dump_ids, keep_ids) <- pickInstDecls so_far ids , not (null dump_ids) -- Drop some instance declarations here = dump_ids ++ addInstDecls (so-far `add` keysOf sump_ids) sccs keep_ids addInstDecls so_far (scc : sccs) ids = scc : addInstDecls (so_far `add` keysOf scc) sccs ids
The first case is easy. The second dumps anyInstDecls
that depend only on earlier declarations The third dumps theTyClDecl
.
And that's about it. There is a bit of potential inefficienty in the repeated traversals
by pickInstDecls
but we don't expect to see a lot of instance declarations anyway, and if that
becomes a problem we can think about a more efficient data structure than a plain list.
Does that sound reasonable? I like that it's very comprehensible!
Minor alternative. Instead of the existing TyClGroup
(constructed by the renamer,
consumed by the type checker), use SCC TyClNode
,
where
data TyClNode name = TyClDeclNode (LTyClDecl name) (Maybe (LRoleAnnots name)) | InstDeclNode (LInstDecl name)
Now we can use TyClNode
instead of Either TyClDecl InstDecl
in the above, which is nice.
comment:6 Changed 20 months ago by
Alex suggests adding a potentially large number of extra edges, and thene doing standard SCC. But that seems a bit artifical, and I don't think it's easy to construct precisely the right extra edges.
Yes, there could be a lot of extra edges: at most the number of InstDecl
s multiplied by the number of TyClDecl
s. Computing them also requires a dfs for every TyClDecl
. Compiler performance isn't so great even without this change, so we should be careful.
I'm fairly certain that these are precisely the right extra edges.
- If
a > b
in the original graph, thena > b
in the augmented graph, because edges are never removed. - The SCCs of the original are the same as in the augmented graph: an edge is added from
a
tob
only if there's no path fromb
toa
(on the graph including the other artificial edges).
Instances can't be mutually recursive, so singletons are what we need.
Do you mean that if an instance appears in a cyclic group, the program will definitely be rejected? If this is true, then your approach is viable, but we need to get the right error message. Perhaps the simplest way to do that is to continue using the existing mechanism brought about by tcTyClGroup
(which my patch does), but this would demand that we give the InstDecl
s in their cyclic groups should they arise.
comment:7 Changed 20 months ago by
Yes, I did not fully understand the algorithm for adding extra edges. I think the specification is this:
- We need an edge that makes a
TyClDecl
D depend on everyInstDecl
that does not depend (transitively) on D.
It took me a few mins to write this down, and it does not look convenient to compute.
If you are happy with the approach I described, can we try that?
Do you mean that if an instance appears in a cyclic group, the program will definitely be rejected?
Yes; a data instance
can only depend on itself via promoted constructors, which we rule out for data type decls and should do for data instances
too:
data instance T Int = MkT (forall (a::Proxy 'MkT). ...blah...)
As you say, it need a decent error message.
comment:8 Changed 20 months ago by
I've spent some more time studying this and now I'm even more convinced that we should stay with my solution. Take a look at these modules:
-- A.hs {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE KindSignatures #-} {-# LANGUAGE UndecidableInstances #-} module A where import Data.Kind type family Closed (t :: Type) :: Type where Closed t = Open t type family Open (t :: Type) :: Type
-- B.hs {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE TypeInType #-} {-# LANGUAGE DataKinds #-} {-# LANGUAGE KindSignatures #-} {-# LANGUAGE GADTs #-} module B where import Data.Kind import A data Q data family F (t :: Type) :: Closed t -> Type type instance Open Q = Bool data instance F Q r where F0 :: F Q 'True
The point is that type instance Open Q
must be checked before data instance F Q r
, even though we can't see this dependency, as it's hidden in A.hs
.
With Simon's suggestion, dependency analysis of B gives [data Q, data family F, type instance Open Q, data instance F Q r]
and then it's reordered to get [data Q, type instance Open Q, data family F, data instance F Q r]
(insert type instance Open Q
as early as possible, then do the same for data instance F Q r
). It's all good! But if we change the order of the definitions in the source file, we get a different result!
-- B.hs {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE TypeInType #-} {-# LANGUAGE DataKinds #-} {-# LANGUAGE KindSignatures #-} {-# LANGUAGE GADTs #-} module B where import Data.Kind import A data family F (t :: Type) :: Closed t -> Type data Q type instance Open Q = Bool data instance F Q r where F0 :: F Q 'True
Here we get [data family F t, data Q, type instance Open Q, data instance F Q r]
, and then we reorder it to [data family F t, data Q, type instance F Q r, type instance Open Q]
(first put type instance Open Q
as soon as possible, then put data instance F Q r
as soon as possible, which is right after data Q
). Checking type instance F Q r
causes a failure because we don't have type instance Open Q
yet.
The issue: this algorithm doesn't discover that type instance Open Q
can go before data family F t
. Whether it fails as above depends upon the order in which we choose to merge instance declarations into the TyClDecl
list, but in any case there will be a way to make it fail based on the order of declarations.
Taking some TyClDecl
order and then inserting the InstDecl
s without ever re-ordering the TyClDecl
s will not work. A more complicated algorithm is needed. The one I've implemented doesn't have this problem, thanks to the artificial dependency of data family F t
on type instance Open Q
.
I should also add that I don't think TyClGroup
should be changed to have only singleton InstDecl
groups. I'd say it's good that this type represents the actual dependency structure, regardless of whether it's valid according to the type checker. We'll let the type checker programs determine that after the groups are made, but having the renamer do this would be overextending the responsibilities of the renamer.
comment:9 Changed 19 months ago by
Cc: | goldfire added |
---|
Sorry to be slow... I'm on holiday.
You give a great example. The key point is that to kind-check the data instance F Q r
declaration, we need to have completely processed the type instance Open Q
declaration. So even instance declarations can have dependencies!
I'd like Richard E's opinion on all this; hence cc'ing goldfire.
To me your example raises the question of whether it's even possible to figure out the dependencies. You probably believe that your "add extra edges" does so. But I still don't know exactly what those edges are. Maybe you could take a comment in this ticket to explain your algorithm precisely?
Meanwhile, here's a new hypothesis. The only instances we must have in hand to do kind-checking are type instance
s, because they give rise to implicit nominal type/kind equalities (e.g. F Int ~ Bool
) that the kind checker must solve.
But data instance
s are different: they do not give rise to implicit nominal type/kind equalities. So in that way they behave more like data type declarations. Indeed, I think that the only way that type/class declaration T can depend on a data instance
declaration D is if T mentions one of D's promoted data constructors. This will be sorted out by the ordinary SCC analysis.
So perhaps this small modification to my algorithm will work: in the step "Partition the InstDecl
nodes from the TyClDecl
nodes", instead partition the type instance
decls from the TyClDecl
and data instance
decls. Then your example will work just fine, because the type instance Open Q
will be dealt with as soon as data Q
is done.
The worry in my mind is whether you could have two type instance
declarations, D1 and D2, where it was essential to process D1 before D2, or vice versa. I can't come up with an example, but neither can I see how to prove that there is no such example.
comment:10 Changed 19 months ago by
I'd like Richard E's opinion on all this
Me too. We also need to think about how a solution for this ticket would hold up under an implementation of #11962.
To me your example raises the question of whether it's even possible to figure out the dependencies. You probably believe that your "add extra edges" does so. But I still don't know exactly what those edges are. Maybe you could take a comment in this ticket to explain your algorithm precisely?
After a bit of thought I came up with a simple change to my last example:
-- A.hs {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE PolyKinds #-} {-# LANGUAGE TypeInType #-} module A where import Data.Kind type family Open (t :: Type) :: Type data family F (t :: Type) :: Open t -> Type
-- B.hs {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE TypeInType #-} {-# LANGUAGE DataKinds #-} {-# LANGUAGE KindSignatures #-} {-# LANGUAGE GADTs #-} module B where import Data.Kind import A data Q type instance Open Q = Bool data instance F Q r where F0 :: F Q 'True
The difference is that the data family is defined in A.hs
and the family Closed
is no longer necessary, as the data family is enough to obscure the Open
dependency. B.hs
as given above will not pass with my approach, but swap the two instance declarations and it's fine.
So perhaps this small modification to my algorithm will work ...
The modification sounds good, but I share your worry and I think we're actually in for a bit more thinking.
{-# LANGUAGE TypeFamilies #-} {-# LANGUAGE PolyKinds #-} {-# LANGUAGE TypeInType #-} {-# LANGUAGE UndecidableInstances #-} module A2 where import Data.Kind type family Open1 (t :: Type) :: Type type family Open2 (t :: Type) (q :: Open1 t) :: Type
{-# LANGUAGE TypeFamilies #-} {-# LANGUAGE TypeInType #-} {-# LANGUAGE DataKinds #-} {-# LANGUAGE KindSignatures #-} {-# LANGUAGE GADTs #-} module B where import Data.Kind import A2 type instance Open1 () = Bool type instance Open2 () 'True = Char
The instance Open1 ()
needs to come before Open2 () 'True
to get that Open1 () ~ Bool
, but how are we to know this? Maybe we always check smaller instances (fewer parameters) before bigger ones?
Sorry to be slow... I'm on holiday.
No problem, enjoy it!
comment:11 Changed 19 months ago by
Excellent example!
It seems clear that there is going to be no reasonable "always-right" solution to this.
Moreover, here is another horrible example
data T1 = MkT1 (...something requiring IA...) data T2 = MkT2 (...something requiring IB...) type instance F1 Int = ...T2... -- IA type instance F2 Int = ...T1... -- IB
Here F1 and F2 are supposed to be like Open1 and Open2 in your example; they must be processed in the right order.
But T1 and T2 appear to be entirely independent of each other, so SCC analysis will put them in some arbitray order; and that order in turn determines which of IA and IB can be processed because IA mentions T2 and IB mentions T1.
So only one ordering of the apparently un-ordered T1, T2 will work.
Can you turn that sketch into a concrete example? It's even more horrible than yours because the invisible ordering of instances seems to force an equally-invisible ordering of the data declarations too. Sigh.
I'm not sure what to do.
The only simple, predictable thing I can think of is this:
- Perform SCC analysis on each group of data/type/class/data-instance decls that lie between successive
type instance
decls. - Otherwise process the file in the order writtten, from top to bottom.
That is, divide the file into segments, delimited by type instance
decls. Process the file top-to-bottom, one segment at a time, processing the type instance
decls in the order they are written.
That's simple, easy to specify, and easy to implement. Where there are no type instance
decls, we get full SCC analysis across the entire module. The big disadvantage is that where you have a forward reference, kind checking will just fail:
data T = MkT S -- Forward reference type instance F Int = Bool data S = MkS Int
That is pretty bad. Can you think of an alternative rule? The goal is NOT to be as clever as possible. THe goal is to be as stupid, straightforward, and predictable as possible, even if that might mean a little manual labour on the part of the programmer.
comment:12 Changed 19 months ago by
Some thoughts, after discussion with my son Michael.
- This entire problem arises only because of open type families; I think closed type families are fine. We could simply ban the use of open type families in the kind signature of an open type family. Or possibly all type families, I'm not sure.
- Another idea: when we have a group of
type instance
decls, suppose we kind check them, but do not yet solve the kind equalities that arise; but add them to the instance environment anyway. Now solve all the kind equalities. In our example, we'd add both instances forOpen1
andOpen2
before trying to solve their kind equalities. Seems a bit scary somehow.
- Question: how do dependently typed languages deal with this? For example, what if typechecking the defintion of a function
f
requires the type checker to evaluate calls tof
? In Haskell-land you could imaginetype family Id (a :: *) :: Id *
We reject this at the moment, but the idea is to get into a situation where, before having typecheckedf
we need to evaluatef
. That's rather like the situation withOpen1/2
.
comment:13 Changed 19 months ago by
I have a very different way of thinking about all of this that may shed some light on the matter.
(Before we get any further, if you'd like my input, do email directly. It's only dumb luck that I happened to catch up on this ticket so soon after requesting my feedback.)
Most declarations declare actually two things: a type and a definition. For data T = MkT Int
, we have the type T :: Type
and the definition, consisting of MkT
. For type family F a where F Bool = Int
, we get the type F :: Type -> Type
and the definition F Bool = Int
.
What's odd about open families is that the type and definition are separated, quite clearly in the code. But when we think about the fact that closed definitions are declaring two things at once, perhaps it's the closed definitions that are odd and the open ones are more natural. (Indeed, at the term level, we make the type and body separate top-level declarations.)
So, perhaps that's the answer: treat closed declarations as two separate pieces, each with its own node in the dependency graph. Now, any declaration (where "declaration" means either a type declaration or a definition declaration, but not both wrapped together) can depend on any other (also separated) declarations.
(Let's hold off a moment on worrying about how to implement this. I'll get to that later. The goal here is simply to describe what programs are accepted.)
Let's review the examples from above:
From the original description:
data family T a data instance T Int = MkT data Proxy (a :: k) data S = MkS (Proxy 'MkT)
Here are the dependencies, where I list a declaration and the things it depends on (not using transitivity):
T
(type only): nothingT Int
(defn only): type ofT
- type of
Proxy
: nothing - defn of
Proxy
: type ofProxy
- type of
S
: nothing - defn of
S
: type ofMkT
(= defn ofT Int
)
In the last bullet, we see that the type of a data constructor is considered in the same declaration as the definition of the parent tycon. The definition of a constructor is also in this same declaration. (This suggests further opportunities for splitting, though. Could one constructor mention another constructor, promoted, from the same type? I think so.)
These bullets also show that all definitions automatically depend on their own types. No surprise there.
From comment:10:
-- module A type family Open (t :: Type) :: Type data family F (t :: Type) :: Open t -> Type
-- module B import A data Q type instance Open Q = Bool data instance F Q r where F0 :: F Q 'True
Open
(type only): nothingF
(type only): type ofOpen
- type of
Q
: nothing - defn of
Q
: type ofQ
Open Q
(defn only): type ofOpen
, type ofQ
F Q r
(defn only): type ofF
, type ofQ
, defn ofOpen
The last bullet above reveals something new: when the type of A
depends on the type of B
, then the definition of A
depends on the definition of B
. In this case, the type of F
depends on the type of Open
, and thus the definition of F
(that is, instance F Q r
) depends on the definition of Open
.
We now have a small conundrum: what on earth is the definition of Open
, an open type family? The only possible answer: all instances that are in scope. These instances are considered as an inseparable clump. Any attempt to break them up by bizarre mutual dependencies is met with failure. I do think there is room to be cleverer here, but I think we'd be too clever by half if we tried.
In this example, it means that we check the Open Q
instance before the F Q r
instance.
Also from comment:10:
-- module A2 type family Open1 (t :: Type) :: Type type family Open2 (t :: Type) (q :: Open1 t) :: Type
-- module B2 import A2 type instance Open1 () = Bool type instance Open2 () 'True = Char
Open1
(type only): nothingOpen2
(type only): type ofOpen1
Open1 ()
(defn only): type ofOpen1
Open2 ()
(defn only): type ofOpen2
, defn ofOpen1
It's now clear that we must check the instance for Open1
before that of Open2
. No difficulty here.
From comment:11:
data T1 = MkT1 (...something requiring IA...) data T2 = MkT2 (...something requiring IB...) type instance F1 Int = ...T2... -- IA type instance F2 Int = ...T1... -- IB
- type of
T1
: nothing - defn of
T1
: defn ofF1
- type of
T2
: nothing - defn of
T2
: defn ofF2
F1 Int
(defn only): type ofF1
, type ofT2
F2 Int
(defn only): type ofF2
, type ofT1
No problem here either. Go in this order: type of T1
, type of T2
, type of F1
, type of F2
, F1 Int
, F2 Int
, defn of T1
, defn of T2
.
The reason there is no problem is that we have separated types from definitions.
Reactions to comment:12:
- It's hard to bad open type families from appearing somewhere, because we'd have to make the ban transitive. We'd thus have two classes of closed type families: those that mention open families somewhere (transitively) and those that don't. This would be awful.
- Your "another idea" might be something like what I've proposed here.
- In answer to your "question": I think only Agda handles this. It's induction recursion. And I think it's rather like what I've sketched here.
Implementation notes
Checking types and definitions separately is a bit of a bother for datatypes, because the datacons and the tycons mutually depend on each other. But I think it's possible -- just a little knotty. Let's assume that the dependency analysis succeeds and that there is a dependency ordering with no (perhaps transitive) self-dependency.
- At the beginning of the "type-checking" (ahem, desugaring) pass in TcTyClsDecls, go through the declarations in reverse order and create a knot for each definition (with the list of datacons knot-tied) and each type (with the tycon knot-tied).
- Then check the declarations in order.
- When a type declaration is processed, close out the tycon's knot. You will now have a tycon usable in every way except to access its datacons.
- When a definition declaration is processed, close out its knot. The datacons will now be untied and usable.
Proposition. If the dependency analysis is correct, the above algorithm accesses nothing that is knot-tied.
I propose naming this algorithm the "Celestial Dance of the Black Holes". It's as dangerous as it sounds. But I believe it's implementable. Perhaps not by humans. :)
The alternative to the Celestial Dance is to create intermediate structures with mutable references and such and then zonk them to get the desired outcome. The zonker will have to be creative about knot-tying, but that's much simpler. Indeed this may be the better approach, perhaps leaning more heavily on the rather new TcTyCon
than we have been.
comment:14 follow-up: 15 Changed 19 months ago by
I don't really understand your suggestion Richard.
Firstly, we often infer the kind of a data type, so it's hard to separate dependence on its kind from dependendence on its definition. I din't undersand the precise dependency analysis that you informally propose.
Secondly, I think that open type families are indeed the odd one out, as you acknowledge in the "odd conundrum". I don't see what's gained by separating types from defns even if we could.
Thirdly, you suggest kind-checking all equations for a type family together as a single group. But you can't do that if they mention data types that have not yet been defined. Yet we can't delay too long ... getting those instances into play early is the whole point.
Fourth, the dependence may be indirect. In the example, Open2 mentions Open1 directly. But it could instead mention F directly, where F is an open family that has an equation like type instance F [t] = Open1 t
. Now the dependence on Open1 is hidden.
comment:15 Changed 19 months ago by
Replying to simonpj:
I don't really understand your suggestion Richard.
Firstly, we often infer the kind of a data type, so it's hard to separate dependence on its kind from dependendence on its definition. I din't undersand the precise dependency analysis that you informally propose.
Yes, of course. But I think this is easily dispatched: if we are inferring the kind, then use this rule:
- If a datatype's definition depends on the definition of
X
, then its type depends on the type ofX
.
I haven't dwelt on this very long, but that seems to capture the nature of types that can be inferred.
Here is something toward a specification of my idea:
- In a datatype definition for
T
:- Let
X
be a tycon mentioned inT
's kind signature or in kind signatures of type variables toT
. ThenT
's type depends onX
's type andT
's definition depends onX
's definition. - Let
K
be a datacon mentioned inT
's kind signature or in kind signatures of type variables toT
. LetS
be the tycon containingK
. (S
might be a data family instance tycon, too.) ThenT
's type depends on the definition ofS
. - Let
X
be a tycon mentioned in the type signature of a datacon inT
. Then the definition ofT
depends on the type ofX
. In addition, ifT
does not have CUSK, then the type ofT
depends on the type ofX
. - Let
K
be a datacon (belonging to tyconS
) mentioned in the type signature of a datacon inT
. Then the definition ofT
depends on the definition ofS
. In addition, ifT
does not have a CUSK, then the type ofT
depends on the type ofS
. (IfS
is a data family instance tycon, then this last sentence refers to the family forS
, not the instance tycon, which has no type declaration to depend on.)
- Let
- In a closed type family definition for
C
:- Let
X
be a tycon mentioned in any kind signature in the head ofC
. ThenC
's type depends onX
's type andC
's definition depends onX
's definition. - Let
K
be a datacon (belonging to tyconS
) mentioned in any kind signature in the head ofC
. ThenC
's type depends onX
's definition. - Let
X
be a tycon mentioned in any equation forC
. ThenC
's definition depends onX
's type. In addition, ifC
does not have a CUSK, thenC
's type depends onX
s type. - Let
K
be a datacon (belonging to tyconS
) mentioned in any equation forC
. ThenC
's definition depends onS
's definition. In addition, ifC
does not have a CUSK, thenC
's type depends onS
's type. (OrS
's family's type, as above.)
- Let
- In the declaration of an open type family
F
:- Let
X
be a tycon mentioned in any kind signature in the declaration forF
. ThenF
's type depends onX
's type andF
's instance block depends onX
's definition. - Let
K
be a datacon (belonging to tyconS
) mentioned in the declaration forF
. thenF
's type depends onS
's type andF
's instance block depends onS
's definition.
- Let
- In a type family instance for
F
:- Let
X
by a tycon mentioned in the instance. ThenF
's instance block depends onX
's type. - Let
K
be a datacon (belonging to tyconS
) mentioned in the instance. ThenF
's instance block depends onS
's definition.
- Let
- In the declaration of an open data family
D
:- Let
X
be a tycon mentioned in any kind signature in the declaration forD
. ThenD
's type depends onX
's type. - Let
K
be a datacon (belonging to tyconS
) mentioned in the declaration forD
. thenD
's type depends onS
's type.
- Let
- In the declaration of a data instance: as for a regular datatype, where the kind signatures are inherited from the data family declaration.
Left for another time: classes.
Question raised: Data families always have a CUSK because a data family is always open. But might a data family instance not have a CUSK? What happens then? Perhaps we need to have CUSK analysis on data family instances, too.
Secondly, I think that open type families are indeed the odd one out, as you acknowledge in the "odd conundrum". I don't see what's gained by separating types from defns even if we could.
We gain induction recursion, #11962. Even before this whole discussion, I had the general idea of this separation. See this wiki page, written some time ago, on this idea.
Thirdly, you suggest kind-checking all equations for a type family together as a single group. But you can't do that if they mention data types that have not yet been defined. Yet we can't delay too long ... getting those instances into play early is the whole point.
It's true that my approach will sometimes fail when a finer-grained approach would succeed. But I don't think we're trying to get every use case. We're trying to get a predictable approach that works just about all the time. Here is an example that would fail:
type family F a = r | r -> a data T :: F a -> Type where MkT :: T False type instance F Int = Bool type instance F Char = Proxy MkT
The definition of T
depends on the first instance while the second instance depends on the definition of T
. I'm not bothered rejecting this.
Fourth, the dependence may be indirect. In the example, Open2 mentions Open1 directly. But it could instead mention F directly, where F is an open family that has an equation like
type instance F [t] = Open1 t
. Now the dependence on Open1 is hidden.
Isn't this just transitivity? Depending on the "definition" of a type family means dependency on the block of instances -- all of them. So anything an instance depends on is transitively reachable.
comment:16 follow-up: 20 Changed 19 months ago by
I'm still trying to grok the latest comments but in the meantime I want to offer an idea similar to this one:
Another idea: when we have a group of type instance decls, suppose we kind check them, but do not yet solve the kind equalities that arise; but add them to the instance environment anyway. Now solve all the kind equalities. In our example, we'd add both instances for Open1 and Open2 before trying to solve their kind equalities. Seems a bit scary somehow.
This doesn't seem scary to me. What if we don't even kind check them before adding their equalities to the environment? Just assume they're well-kinded while checking the other declarations. This way, we don't have to make sure the open type family instances come as early as possible in the order.
- Do SCC analysis on all declarations except for open type family instances, to give ordered
TyClGroup
s and the set of those instances (in source file order). - Before checking the
TyClGroup
s, put all equalities arising from the open type family instances into the environment (assume for now that they're well-kinded). - Check the
TyClGroup
s in order as usual. - Check the open type family instances (verifying or falsifying the assumption in 2).
Maybe this is nonsense. Would it be possible to make and use a coercion axiom without kind checking the lhs and rhs types?
comment:17 Changed 18 months ago by
Milestone: | → 8.2.1 |
---|---|
Priority: | normal → high |
Summary: | Promote data family instance constructors → Promote type/data family instance constructors |
Type: | feature request → bug |
So it turns out that https://ghc.haskell.org/trac/ghc/ticket/12239 is a manifestation of the same issue. Copying the example from it for completeness:
{-# LANGUAGE TypeInType, GADTs, TypeFamilies #-} import Data.Kind (Type) data N = Z | S N data Fin :: N -> Type where FZ :: Fin (S n) FS :: Fin n -> Fin (S n) type family FieldCount (t :: Type) :: N type family FieldType (t :: Type) (i :: Fin (FieldCount t)) :: Type data T type instance FieldCount T = S (S (S Z)) type instance FieldType T FZ = Int type instance FieldType T (FS FZ) = Bool type instance FieldType T (FS (FS FZ)) = String
Reordering the declarations slightly makes it compile:
{-# LANGUAGE TypeInType, GADTs, TypeFamilies #-} import Data.Kind (Type) data N = Z | S N data Fin :: N -> Type where FZ :: Fin (S n) FS :: Fin n -> Fin (S n) type family FieldType (t :: Type) (i :: Fin (FieldCount t)) :: Type data T -- Note that 'FieldCount' is now declared after 'FieldType' and 'T'. type family FieldCount (t :: Type) :: N type instance FieldCount T = S (S (S Z)) type instance FieldType T FZ = Int type instance FieldType T (FS FZ) = Bool type instance FieldType T (FS (FS FZ)) = String
Hope it helps.
comment:18 Changed 18 months ago by
Cc: | int-index added |
---|---|
Related Tickets: | #11348 → #11348, #12239 |
Summary: | Promote type/data family instance constructors → Type/data family instances in kind checking |
comment:19 Changed 18 months ago by
Another example, where a newtype complicates things ever further. I tried adding TH sections to enforce an order in which type family instances are added but it didn't help:
{-# LANGUAGE TemplateHaskell, TypeInType, GADTs, TypeFamilies #-} {-# OPTIONS -Wno-unticked-promoted-constructors #-} import Data.Kind (Type) data N = Z | S N data Fin :: N -> Type where FZ :: Fin (S n) FS :: Fin n -> Fin (S n) type family FieldCount (t :: Type) :: N type family FieldType (t :: Type) (i :: Fin (FieldCount t)) :: Type newtype Field (t :: Type) (i :: Fin (FieldCount t)) = Field (FieldType t i) return [] -- split TH section data L type instance FieldCount L = S Z return [] -- split TH section type instance FieldType L FZ = () return [] -- split TH section -- newtype not involved, compiles. l_fz :: FieldType L FZ l_fz = () -- newtype involved, doesn't compile. field_l_fz :: Field L FZ field_l_fz = Field l_fz
The error I'm getting is:
field.hs:36:20: error: • Couldn't match type ‘FieldType L 'FZ’ with ‘()’ • In the first argument of ‘Field’, namely ‘l_fz’ In the expression: Field l_fz In an equation for ‘field_l_fz’: field_l_fz = Field l_fz
comment:20 Changed 18 months ago by
Replying to alexvieth:
- Do SCC analysis on all declarations except for open type family instances, to give ordered
TyClGroup
s and the set of those instances (in source file order).- Before checking the
TyClGroup
s, put all equalities arising from the open type family instances into the environment (assume for now that they're well-kinded).- Check the
TyClGroup
s in order as usual.- Check the open type family instances (verifying or falsifying the assumption in 2).
Maybe this is nonsense. Would it be possible to make and use a coercion axiom without kind checking the lhs and rhs types?
I'm not sure I understand. Currently, kind-checking an instance and desugaring it are all done at the same time, so step (2) is hard to implement. Let's say we get around that barrier, though. This would then put ill-kinded equalities into the context. I think it would be awfully hard to avoid getting GHC to loop or do other silly things with a bad context. Perhaps you could squeeze this through, but I'm very skeptical.
Or am I misunderstanding something?
comment:21 Changed 18 months ago by
I'm not convinced that comment:17 is the same bug here. That, to me, looks like a bug in the resolution to #11348. Doing the major rewrite I propose in comment:13 and comment:15 may likely help, but there's a good chance comment:17 could be fixed with much less effort.
Alex, you're the expert here. What do you think?
comment:22 Changed 18 months ago by
I really think this exactly the same problem. The crux of this ticket is
(type/data) instances can depend on other (type/data) instances so we have to get the order right.
#11348 goes some of the way to get the right order but doesn't account for dependencies between (type/data) instances.
I am trying to write up these discussions on a wiki page as it is very messy currently.
comment:23 Changed 18 months ago by
But comment:17 has no data instances. There are only type instances there, hence my confusion.
comment:24 Changed 18 months ago by
@Richard, it seems to me the crux of the problem is that type instances introduce equalities so they must be interleaved with the other declarations. Data instances are just one place where these extra equalities can be used, another being kind checking for type equalities as in comment:17.
Here is the wiki page I started. It is not complete but I am too tired to carry on tonight. Perhaps you could look Alex and fill in some of the details?
https://ghc.haskell.org/trac/ghc/wiki/InterleavingTypeInstanceChecking
comment:25 Changed 18 months ago by
Sorry for the delay, new contract is keeping my occupied. comment:17 certainly shows a shortcoming in the patch for #11348. No sense reopening it though, this ticket can supersede it (its summary could use a revision for scope). Thanks for the wiki writeup mpickering. It's accurate and I have nothing to add right now. That section called "The Solution" is a tall order!
Replying to goldfire:
This would then put ill-kinded equalities into the context. I think it would be awfully hard to avoid getting GHC to loop or do other silly things with a bad context. Perhaps you could squeeze this through, but I'm very skeptical.
Or am I misunderstanding something?
You're probably not misunderstanding. I'm just sounding off ideas and I'm no expert on GHC's type checker. Putting ill-kinded equalities into the context sounds irresponsible yes, but they'll all be of the form F k ~ l
for some open type family F
(or similar for higher arities). Does this fact help at all? Could you come up with a case where an ill-kinded equality of this form would wreak havoc?
How about a variation on that suggestion of mine? A kind of lazy evaluation for open type families in kinds. While type checking other declarations, rather than assuming the (possibly ill-kinded) equalities arising from open type families to be true, defer them until after all declarations are checked. Then we end up with a set of equalities featuring open type family constructors which must be solved using all open type family instances. So in the FieldCount
example from comment:17, repeated here for convenience:
{-# LANGUAGE TypeInType, GADTs, TypeFamilies #-} import Data.Kind (Type) data N = Z | S N data Fin :: N -> Type where FZ :: Fin (S n) FS :: Fin n -> Fin (S n) type family FieldCount (t :: Type) :: N type family FieldType (t :: Type) (i :: Fin (FieldCount t)) :: Type data T type instance FieldCount T = S (S (S Z)) type instance FieldType T FZ = Int type instance FieldType T (FS FZ) = Bool type instance FieldType T (FS (FS FZ)) = String
The only declarations which give rise to a deferred kind equality are the FieldType
instances.
type instance FieldType T FZ = Int
yieldsS n0 ~ FieldCount T
type instance FieldType T (FS FZ) = Bool
yieldsS (S n1) ~ FieldCount T
type instance FieldType T (FS (FS FZ)) = String
yieldsS (S (S n2)) ~ FieldCount T
These don't stop type checking dead, they're just tucked away for later and type instance FieldCount T = S (S (S Z))
will eventually be checked (either before or after, we don't care) and available when those three equalities are solved. The program will therefore pass: n0 := (S (S Z))
, n1 := S Z
, n2 := Z
.
comment:26 Changed 18 months ago by
I still am in favor of my comment:13 and comment:15. The other approaches seem a bit ad-hoc and perhaps are approximations of my proposal.
comment:27 Changed 18 months ago by
Perhaps that last comment was a bit strongly worded, upon re-reading. I do think comment:25 may be onto something -- waiting until later to solve equality constraints is something GHC does well -- but I don't have a clear enough specification of comment:25 in my head to really analyze.
comment:28 Changed 17 months ago by
I had a good conversation with Richard and Stepanie about this ticket.
Our conclusions.
- Alex's cunning examples demonstrate that it's very difficult (perhaps impossible) to come up with a dependency analysis that always does the Right Thing. Even if it's possible we should not do it because (a) it'd be hard to implement and (b) it'd be extremely hard to explain exactly which programs should be OK.
- Bottom line. Instead of getting in deeper, pull back. Do somthing that is simple to explain, predicatable, and simple to implement, even if it' a little less convenient.
- Richard's long remarks in comment:15 actually relate to somethiug rather different called (I think) "induction recursion". It is cool but it should have a separate ticket. Richard can you do that?
Proposal
Get the programmer to do "phasing" explicitly if it is necessary. Specifically
- Introduce a top-level binding separator, thus
f x = x + 1 ============ g x = f x
The==========
is the binding separator.
- Concrete syntax to be decided. But we have this very thing already, thus
f x = x+1 $([d| |]) -- Empty Template Haskell top level splice g x = f x
New concrete syntax is just to make it less wierd.
- Semantics. Everything before the separator is renamed and typechecked before anything after.
- When typechecking data/class/instance decls, we revert to something close to the situation before Alex's patch:
class
,type
,data
, anddata instance
declarations are SCCd and typechecked in order- When that is complete typecheck class
instance
andtype instance
declarations. Note thatdata instance
decls come in the first wave (addressing the example in the Description), but the slipperytype instance
declarations happen only afterwards
- I think we could perhaps include closed type families in step (1); c.f. comment:12. Not certain.
In Alex's example in comment:8 the type instance F
would happen last, which would fail (consistently so). To fix it, use a separator:
module B where import Data.Kind import A data Q data family F (t :: Type) :: Closed t -> Type type instance Open Q = Bool ============ -- The separator ensures that data instance F Q r is -- typechecked after the type instance Open Q data instance F Q r where F0 :: F Q 'True
comment:29 Changed 17 months ago by
What about the example in comment:19? Adding binding separators didn't fix the issue.
comment:30 Changed 17 months ago by
Here's another example of the phasing problem, this one posted by Dave Menendez:
{-# LANGUAGE TemplateHaskell, TypeInType, TypeFamilies #-} module DH where import Data.Kind (Type) type family K t :: Type type family T t :: K t -> Type data List type instance K List = Type type instance T List = [] -- Error on this line -- Error is: -- * Expected kind K List -> Type, but [] has kind `* -> *` -- In the type `[]` -- In the type instance declaration for `T`
The reasson is the we need teh K List
instance to typecheck the T List
instance.
Adding a separator just before the instance T List
declaration makes it work
comment:31 Changed 17 months ago by
Replying to simonpj:
It is cool but it should have a separate ticket. Richard can you do that?
It's here: #11962
Simon, what do you think of my idea from comment 25?
Bottom line. Instead of getting in deeper, pull back. Do somthing that is simple to explain, predicatable, and simple to implement, even if it' a little less convenient.
The proposed phasing annotations might be convenient for ghc developers but they'd be inconvenient and surprising for ghc users. I appreciate that, at the term level, I don't need to worry about the order of my definitions. I'd like the same to be true at type level if possible, and I think it is possible. I'm just not sure if it will be simple to implement.
comment:32 Changed 17 months ago by
RE comment:25:
- I have no idea how to actually implement this idea. Type constructors, axioms, etc have no late-bound holes in them and it would be architeturally difficult to change that, becuase they are all tied together so they point to each other in a graph. Difficult to update.
- More seriously, I do not know whother the resulting system would be sound. The kinding of one instance depends on executing another instance; and vice versat. At at minimum I'd want to see a specification in langauge that a type-theory person could understand, and a sketch proof of soundness.
- Stephanie Weirich and Ricahrd Eisenberg are world experts of this stuff. We sat down and ICFP and made zero progress. That doesn't mean that no progress is possible; but it does mean that it's premature to implement anything until we undersatnd the theory.
- It's important to bre able to explain to programmers exactly what will kind-check and what will not. The fact that the implementation is already jolly tricky, and yet does not do the job, is a worry. Making it much more complicated still feels wrong to me.
Better to withdraw to something that is (a) simple to explain and (b) simple to implement. That leaves a clear research challenge for some research student to tackle.
I appreciate that, at the term level, I don't need to worry about the order of my definitions.
I agree. But I don't know anything better. If Ricahrd gets his way, thing will happen at the term level, when typing f
requires executing g
anf vice versa. Except it's worse at the type level becaues of open type families.
I specualte that you'll never need a phasing annotation if you only need closed type families. Maybe that's why this issue doesn't show up for Coq, Agda etc?
comment:33 Changed 17 months ago by
I'm still unaware of a concrete example that my comment:13 and comment:15 do not handle.
During the conversation with Simon and Stephanie at ICFP, we indeed agreed on the "separator" solution. Easy to implement, explain, and understand. Perhaps slightly annoying, though. However, this conversation took place without an ability to get on Trac due to local connectivity challenges, and I took Simon's word about the existence of such an example. But now I can't find one.
Perhaps comment:13 and comment:15 are too baroque to implement (I claim: it's no worse than lots of other stuff!) and to explain to/understand by users (this bit may be an effective argument against my idea). But I'd like us to understand why we don't do this idea before we give up and just choose the simple (but perhaps annoying) solution.
Note that comment:13/comment:15 handle the example in comment:19. Because FieldType
's type depends on FieldCount
's, then FieldType
's equations depend on FieldCount
's equations, too. Thus the dependency would be caught and the compiler can process the instances in the correct order.
comment:34 Changed 14 months ago by
Milestone: | 8.2.1 → 8.4.1 |
---|
This sounds like something that will not be happening for 8.2.
comment:35 Changed 13 months ago by
Smaller example
type family Open a type instance Open Int = Bool type family F a :: Open a type instance F Int = True
comment:36 Changed 13 months ago by
The nub of my idea in comment:13 and comment:15 is this:
- Track dependency on declarations independently from definitions.
- When
T
's declaration depends onS
's declaration, thenT
's definition depends onS
's definition.
- Dependency tracking is, as usual, transitive.
So in the smaller example in comment:35, F
's declaration depends on Open
's. So F
's definition -- the instances -- depends on Open
's. This means we type check Open
's instances before F
's. There's lots more detail (comment:15) but this is the basic idea.
Here is a case (due to Iavor) that would be rejected:
type family Open a type instance Open Int = Bool type instance Open Char = F FLoat type instance Open Float = Type type family F a :: Open a type instance F Int = True type instance F Char = [True] type instance F Float = [Bool]
The instances for a given type family are checked as a clump, and these instances can be accepted only by interleaving.
comment:37 Changed 13 months ago by
Ah, now that makes more sense. To flesh it out slightly:
- The declaration
F:sig
of a type familyF
is its kind signature, e.g.type family F a :: * -> *
- The definition
F:def
of a type familyF
is the collection of all its type instances (in this module). e.g.type instance F Int = Maybe type instance F Bool = []
For the purposes of this conversation we treat the instances for a single type familyF
as a single "clump"; they are always treated together.
- Each vertex of the dependency graph is either
- the declaration
F:sig
(i.e. kind signature) of a type familyF
, or - the definition
F:def
(i.e. type instances) of the type family.
- the declaration
- If there is an edge in the graph from A to B, then A depends on B, and B must be type-checked before A.
- The edges are as follows
- An edge from
F:def
toF:sig
- An edge from
F:sig
toG:sig
, and fromF:def
toG:def
, for any tyconG
syntactically mentioned inF
's declaration - An edge from
F:def
toG:def
for any tyconG
syntactically mentioned inG
's definition.
- An edge from
Now do strongly connected component analsis and process in order.
Richard's magic is in the bold part of the second bullet.
We could do with a proof of some kind.
comment:38 Changed 13 months ago by
Related Tickets: | #11348, #12239 → #11348, #12239, #12643 |
---|
When fixed, look at #12643
comment:40 Changed 9 months ago by
Related Tickets: | #11348, #12239, #12643 → #11348, #12239, #12643, #13790 |
---|
See also #13790
comment:41 Changed 8 months ago by
Cc: | RyanGlScott added |
---|
Looks reasonable to me.