Opened 3 years ago

Closed 3 years ago

#11028 closed bug (fixed)

Refactor ConDecl

Reported by: simonpj Owned by: alanz
Priority: normal Milestone: 8.0.1
Component: Compiler Version: 7.10.2
Keywords: Cc: alanz, mpickering, jstolarek
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: #11155 Differential Rev(s): Phab:D1558
Wiki Page:

Description (last modified by simonpj)

The ConDecl type in HsDecls is an uneasy compromise. For the most part, HsSyn directly reflects the syntax written by the programmer; and that gives just the right "pegs" on which to hang Alan's API annotations. But ConDecl doesn't properly reflect the syntax of Haskell-98 and GADT-style data type declarations.

To be concrete, here's a draft new data type

data ConDecl name
  | ConDeclGADT
      { con_names   :: [Located name]
      , con_type    :: LHsSigType name  -- The type after the ‘::’
      , con_doc     :: Maybe LHsDocString }

  | ConDeclH98
      { con_name    :: Located name

      , con_qvars     :: Maybe (LHsQTyVars name)
        -- User-written forall (if any), and its implicit
        -- kind variables 
        -- Non-Nothing needs -XExistentialQuantification

      , con_cxt       :: Maybe (LHsContext name)
        -- ^ User-written context (if any)

      , con_details   :: HsConDeclDetails name
          -- ^ Arguments

      , con_doc       :: Maybe LHsDocString
          -- ^ A possible Haddock comment.
      } deriving (Typeable)

Note that

  • For GADTs, just keep a type. That's what the user writes. (NB: HsType can represent records on the LHS of an arrow: { x:Int,y:Bool } -> T
  • con_qvars and con_cxt are both Maybe because they are both optional (the forall and the context of an existential data type
  • For ConDeclGADT the type variables of the data type do not scope over the con_type; whereas for ConDeclH98 they do scope over con_cxt and con_details.

This ticket is just to track the thought and invite feedback.

Attachments (2)

old-condecl.png (53.5 KB) - added by alanz 3 years ago.
CmmNode documents for 7.10.2
new-condecl.png (52.0 KB) - added by alanz 3 years ago.
CmmNode with #11028 patch

Download all attachments as: .zip

Change History (48)

comment:1 Changed 3 years ago by goldfire

I have no objections. It would be nice to somehow capture the fact that all constructors are either one or the other. But that might be annoying to code with.

comment:2 Changed 3 years ago by simonpj

Test ghc-api/annotations/T10399 is expect-broken on this ticket.

comment:3 Changed 3 years ago by alanz

Cc: alanz mpickering added

comment:4 in reply to:  2 Changed 3 years ago by jstolarek

Replying to simonpj:

The con_old_rec is a warning thing for GADTs (see current source code)

Incidentally, I removed that field not so long ago - see ea8c116ac9eb916fdb6360a01c285bc8698dfaf9.

data ConDecl name
  | ConDeclGADT
      { con_names   :: [Located name]
      , con_type    :: LHsSigType name  -- The type after the ‘::’
      , con_doc     :: Maybe LHsDocString
      , con_old_rec :: Bool }

What is LHsSigType? A synonym for LHsType? If so then, perhaps it would be a good idea to have separate fields for storing foralls and the context? I'm looking at my work on #10828 (Phab:D1465) and I think the implementation in DsMeta would get a bit more difficult if foralls and context are folded into one field with the whole type. Of course we can create functions that extract these information from ConDecl but wouldn't separate fields be easier?

comment:5 Changed 3 years ago by jstolarek

Cc: jstolarek added

comment:6 Changed 3 years ago by simonpj

Jan: see branch wip/spj-wildcard-refactor, and https://phabricator.haskell.org/D1428, which will land shortly

comment:7 Changed 3 years ago by alanz

Owner: set to alanz

I am making a start on this, using wip/spj-wildcard-refactor as the base.

comment:8 Changed 3 years ago by simonpj

Description: modified (diff)

Terrific. Please make a branch and push to it, so I can join in.

Will you use the above data type as the basis?

Simon

comment:9 Changed 3 years ago by simonpj

Description: modified (diff)

PS: I have edited the proposed data type a bit. Do you agree with it?

comment:10 Changed 3 years ago by alanz

Yes, it maps more clearly on to the original.

comment:11 Changed 3 years ago by alanz

I wonder if we should use ConDeclSimple instead of ConDeclH98.

Alternatively mkSimpleConDecl should become mkCondeclH98.

comment:12 Changed 3 years ago by alanz

Also, at the moment the con_doc field is never set for a GADT. It should either be removed or the parser modified to allow docs.

comment:13 in reply to:  11 Changed 3 years ago by jstolarek

Replying to alanz:

I wonder if we should use ConDeclSimple instead of ConDeclH98.

Alternatively mkSimpleConDecl should become mkCondeclH98.

I vote for the latter - mkConDeclH98 is more informative than mkSimpleConDecl.

Also, at the moment the con_doc field is never set for a GADT. It should either be removed or the parser modified to allow docs.

Again, my vote for the latter. If we store docs for H98 constructors then we should also store them for GADTs.

comment:14 Changed 3 years ago by simonpj

Yes, I think I prefer mkConDeclH98 too.

As to con_doc it depends on the Haddock folk. Maybe file a ticket on the Haddock issue tracker; I don't know if they have a story about documentation GADT data types. But they should!

comment:15 Changed 3 years ago by alanz

Seems I was wrong about the con_doc, it is added after the fact by addConDoc

comment:16 Changed 3 years ago by alanz

At the moment kcConDecl is defined as

kcConDecl (ConDecl { con_names = names, con_qvars = ex_tvs
                   , con_cxt = ex_ctxt, con_details = details
                   , con_res = res })
  = addErrCtxt (dataConCtxtName names) $
         -- the 'False' says that the existentials don't have a CUSK, as the
         -- concept doesn't really apply here. We just need to bring the variables
         -- into scope!
    do { _ <- kcHsTyVarBndrs False ex_tvs $
              do { _ <- tcHsContext ex_ctxt
                 ; mapM_ (tcHsOpenType . getBangType) (hsConDeclArgTys details)
                 ; _ <- tcConRes res
                 ; return (panic "kcConDecl", ()) }
       ; return () }

I am working on the ConDeclGADT case, and have

kcConDecl (ConDeclGADT { con_name = name
                       , con_type = HsIB { hsib_kvs = ex_kvs
                                         , hsib_tvs = ex_tvs
                                         , hsib_body = res_ty} })
  = addErrCtxt (dataConCtxtName [name]) $
         -- the 'False' says that the existentials don't have a CUSK, as the
         -- concept doesn't really apply here. We just need to bring the variables
         -- into scope!
    do { _ <- kcHsTyVarBndrs False ex_tvs $
       .....
       ; return () }

It is not clear to me whether kcHsTyVarBndrs False ex_tvs should be checking ex_kvs, ex_tvs, or both. I suspect ex_kvs only.

comment:17 Changed 3 years ago by simonpj

No no! con_type is a LHsSigType. So use TcHsType.kcClassSigType (give it a less specific name!); it does exactly the right thing.

Simon

comment:18 Changed 3 years ago by alanz

tcConDecl returns a DataCon.

For ConDeclGADT, should I bring in a function

gadtDeclDetails :: LHsSigType name -> HsConDeclDetails name
gadtDeclDetails (HsIB {hsib_body = lbody_ty}) = details
  where
    (tvs, cxt, tau) = splitLHsSigmaTy lbody_ty
    details           -- See Note [Sorting out the result type]
      = case tau of
          L _ (HsFunTy (L l (HsRecTy flds)) res_ty)
                  -> RecCon (L l flds)
          _other  -> PrefixCon []

and then proceed with the process as it is now? Should there be a call to tcHsSigType?

comment:19 Changed 3 years ago by simonpj

Yes, because of the possibility of HsRecTy, you can't use tcHsSigType. So something like what you propose looks good.

Ca you make a branch and push what you have so that I can see?

comment:20 Changed 3 years ago by alanz

I just pushed an update to wip/T11028

It now compiles for stage1. Only.

comment:21 Changed 3 years ago by alanz

Commit https://phabricator.haskell.org/rGHCff2978cac0cd fails with the res_ty not matching,

When type checking

data MaybeO ex t where
  JustO    :: t -> MaybeO O t
  NothingO ::      MaybeO C t

This code generates.

libraries/hoopl/src/Compiler/Hoopl/Block.hs:66:3: error:
    • Data constructor ‘JustO’ returns type ‘t -> MaybeO O t’
        instead of an instance of its parent type ‘MaybeO ex t’
    • In the definition of data constructor ‘JustO’
      In the data type declaration for ‘MaybeO’

I am pretty sure the res_ty' variable in [1] should be MaybeO O t but it is not clear how this happens in the original code.

[1] https://github.com/ghc/ghc/blob/ff2978cac0cd133c2434480e311bed6aea72c6f1/compiler/typecheck/TcTyClsDecls.hs#L1398

comment:22 Changed 3 years ago by alanz

Ok, the problem is in rnConDecl in the ConDeclGADT case, which in this branch simply calls rnHsSigType.

Instead it should perform as per the original, which splits the result type,leaving it as MaybeO ex t in this case.

comment:23 in reply to:  22 Changed 3 years ago by simonpj

Replying to alanz:

Ok, the problem is in rnConDecl in the ConDeclGADT case, which in this branch simply calls rnHsSigType.

Instead it should perform as per the original, which splits the result type,leaving it as MaybeO ex t in this case.

Wait! Why? What would the difference be in the result of rnConDecl? I'm suspicious; I'd like to understand better. An example perhaps?

comment:24 Changed 3 years ago by alanz

The key factor is the work currently done in rnConResult,

rnConResult doc _con details (ResTyGADT ls ty)
  = do { (ty', fvs) <- rnLHsType doc ty
       ; let (arg_tys, res_ty) = splitHsFunType ty'
                -- We can finally split it up,
                -- now the renamer has dealt with fixities
                -- See Note [Sorting out the result type] in RdrHsSyn

       ; case details of
           InfixCon {}  -> pprPanic "rnConResult" (ppr ty)
           -- See Note [Sorting out the result type] in RdrHsSyn

           RecCon {}    -> do { unless (null arg_tys)
                                       (addErr (badRecResTy doc))
                              ; return (details, ResTyGADT ls res_ty, fvs) }

           PrefixCon {} -> return (PrefixCon arg_tys, ResTyGADT ls res_ty, fvs)}

splitHsFunTy on t -> MaybeO a b will return ([t],MaybeO a b), and the latter is then returned as the new res_ty

My understanding is that we are doing this so that the ParsedSource more accurately reflects what was seen, which can simplify the API Annotations work.

Perhaps we should add a PostRn field to the ConDeclGADT to capture the new details and res_ty.

Alternatively, everything except the rnLHsType in rnConResult can move to the type checking.

comment:25 Changed 3 years ago by simonpj

Alternatively, everything except the rnLHsType in rnConResult can move to the type checking.

That sounds exactly right. Note [Sorting out the result type] in RdrHsSyn seems entirely redundant with the new ConDeclGADT form.

comment:26 Changed 3 years ago by alanz

The current error is

compiler/cmm/CmmNode.hs:88:20: error:
    Record syntax is illegal here:
      {cml_pred :: CmmExpr,
       cml_true, cml_false :: {-# UNPACK #-} !Label,
       cml_likely :: Maybe Bool}

which is a record style GADT constructor declaration.

I do not understand how

rnHsTyKi _ doc ty@(HsRecTy flds)
  = do { addErr (hang (ptext (sLit "Record syntax is illegal here:"))
                    2 (ppr ty))
       ; (flds', fvs) <- rnConDeclFields [] doc flds
       ; return (HsRecTy flds', fvs) }

works. The error is always added, I do not see discardErrors in rnConDeclFields. What am I missing?

comment:27 Changed 3 years ago by simonpj

I think you should just move the check to the typechecker.

It's there to stop you writing

f :: { x::Int } -> Int
f = blah

in normal code. But the renamer now treats a GADT constructor type uniformly, so we don't want to reject it. The type checker treats them differently, so it can reject HsRecTy in ordinary types but not in GADTs

comment:28 Changed 3 years ago by alanz

Something else I am bumping my head against is that rnField gets passed an empty fl_env and so lookupField generates a panic when the lookup fails.

It seems that the new RdrHsSyn.gadtDeclDetails needs to be called before this, but unless the results are stored the names may be discarded. I do not know enough about this atm.

comment:29 Changed 3 years ago by simonpj

Ah yes. To rename a record we need to know what data constructor it is from; this call in rnHsTyKi

       ; (flds', fvs) <- rnConDeclFields [] doc flds

should not have the empty list, but rather the result of a lookup, just as in RnSource.rnConDeclDetails. (Indeed better to pass the Name of the data constructor to rnConDeclFields.)

Where to get the Name of the DataCon from in the case of GADTs. It could be in the HsDocContext passed down the type renamer. Change ConDeclCtx to take a Name instead of RdrNames.

comment:30 Changed 3 years ago by simonpj

BTW, a refactoring I'd love to see done is this: combine RnEnv.HsDocContext (used in the renamer) and TcType.UserTypeCtxt (used in the type checker).

Both are fairly big types; but they are practically the same already, except that the former uses RdrNames while the latter uses Names. But that's accidental; in fact the renamer could perfectly well construct contexts with Names in them.

I think that everything else is just accidental differences.

comment:31 Changed 3 years ago by alanz

So kcConDecl can't just call kcHsSigType, the strict marks on arguments need to be managed.

comment:32 in reply to:  31 Changed 3 years ago by simonpj

Bother.

I guess that to typecheck a GADT decl we'll need a function with a type like this:

tcGadtSigType :: LHsSigType Name -> TcM ([StrictnessMark], [FieldLabel], [Type], Type)

or something like that. It takes a GADT data constructor type signature, like

C :: { x :: !Int, y:: Bool } -> T Int

and returns the typechecked pieces.

Once you've got that, you can use the very same function during kind checking. Just call it and discard the result.

comment:33 Changed 3 years ago by alanz

I am now using rnHsSigType to rename the RHS of a ConDeclGADT, passing in a ConDeclCtx.

This ends up calling

rnHsTyKi _ doc ty@(HsForAllTy { hst_bndrs = tyvars, hst_body  = tau })
  = bindLHsTyVarBndrs doc Nothing tyvars $ \ tyvars' ->
    do { (tau',  fvs) <- rnLHsType doc tau
       ; warnUnusedForAlls (inTypeDoc ty) tyvars' fvs
       ; return ( HsForAllTy { hst_bndrs = tyvars', hst_body =  tau' }
                , fvs) }

warnUnusedForAlls does not use the passed in context, but inTypeDoc ty instead, causing tests such as T5331 to fail.

Possible solutions seem to be

  1. use the passed in context for the warnUnusedForalls, which has the disadvantage of giving a less precise location.
  1. Combine the doc and the inTypeDoc ty contexts for the warning.
  1. Update the expected warning in the test, the forall context is good enough.

Which is the best option, or is there another solution?

comment:34 Changed 3 years ago by simonpj

What is the regression? I think it's that for

data W where
  W1 :: forall a. W

we get

T5331.hs:11:16: Warning:
    Unused quantified type variable ‘a’
    In the type `forall a. W’

instead of

T5331.hs:11:16: Warning:
    Unused quantified type variable ‘a’
    In the definition of data constructor ‘W1’

If so, I think that's fine -- accept it.

comment:35 Changed 3 years ago by alanz

Ok, will do. Option 3.

comment:36 Changed 3 years ago by alanz

I will add this to phab for review once wip/spj-wildcard-refactor lands

comment:37 Changed 3 years ago by alanz

Differential Rev(s): Phab:D1558

comment:38 Changed 3 years ago by alanz

Milestone: 8.0.1

comment:39 Changed 3 years ago by alanz

The haddock linker failure in Phab:D1558 is due to #11155

Changed 3 years ago by alanz

Attachment: old-condecl.png added

CmmNode documents for 7.10.2

Changed 3 years ago by alanz

Attachment: new-condecl.png added

CmmNode with #11028 patch

comment:40 Changed 3 years ago by alanz

I am finishing off the haddock changes for this.

The original documentation generated for CmmNode was https://ghc.haskell.org/trac/ghc/attachment/ticket/11028/old-condecl.png

With the update from the patch this becomes https://ghc.haskell.org/trac/ghc/attachment/ticket/11028/new-condecl.png

See in particular CmmCondBranch. In the original it gives the (supposed) prefix constructor version, followed by the fields. But because cml_true and cml_false are part of the same field declaration, only one instance of !Label appears.

In the new one, only the record fields are listed at the moment.

Question: Should a prefix type signature be given at all, for a record GADT ConDecl? Or should it be left as per the new image?

comment:41 Changed 3 years ago by alanz

Status: newpatch

comment:42 Changed 3 years ago by simonpj

The old one is clearly wrong when we have

  cml_true, cml_false :: !Label

We could fix it to provide both a prefix signature and list the fields; and that is presumably what the old version was supposed to do. And the new form which looks like

  CmmCondBranch :: -> CmmNode O C
    cml_true, cml_false :: !Label
    ...etc...

does look very weird. CmmCondBranch :: -> CmmNode O C just isn't Haskell.

Why not just print it out in the source syntax form:

  CmmCondBranch :: { cml_true, cml_false :: ! Label
                   , ...etc...
                   } -> CmmNode O C

Admittedly that's more work, because it's not what Haddock does right now. But it would make more sense to users wouldn't it?

comment:43 Changed 3 years ago by alanz

After a brief chat on IRC we changed it in the current patch to

CmmCondBranch :: {..} -> CmmNode O C

followed by the field definition.

The other option is to surround the field definition with braces and put the return type at the and, as per the original source.

comment:44 Changed 3 years ago by alanz

Clarification: the chat took place shortly after the issue was identified.

comment:45 Changed 3 years ago by Ben Gamari <ben@…>

In 51a5e68d/ghc:

Refactor ConDecl

The ConDecl type in HsDecls is an uneasy compromise. For the most part,
HsSyn directly reflects the syntax written by the programmer; and that
gives just the right "pegs" on which to hang Alan's API annotations. But
ConDecl doesn't properly reflect the syntax of Haskell-98 and GADT-style
data type declarations.

To be concrete, here's a draft new data type

```lang=hs
data ConDecl name
  | ConDeclGADT
      { con_names   :: [Located name]
      , con_type    :: LHsSigType name  -- The type after the ‘::’
      , con_doc     :: Maybe LHsDocString }

  | ConDeclH98
      { con_name    :: Located name

      , con_qvars     :: Maybe (LHsQTyVars name)
        -- User-written forall (if any), and its implicit
        -- kind variables
        -- Non-Nothing needs -XExistentialQuantification

      , con_cxt       :: Maybe (LHsContext name)
        -- ^ User-written context (if any)

      , con_details   :: HsConDeclDetails name
          -- ^ Arguments

      , con_doc       :: Maybe LHsDocString
          -- ^ A possible Haddock comment.
      } deriving (Typeable)
```

Note that

    For GADTs, just keep a type. That's what the user writes.
    NB:HsType can represent records on the LHS of an arrow:

      { x:Int,y:Bool} -> T

    con_qvars and con_cxt are both Maybe because they are both
    optional (the forall and the context of an existential data type

    For ConDeclGADT the type variables of the data type do not scope
    over the con_type; whereas for ConDeclH98 they do scope over con_cxt
    and con_details.

Updates haddock submodule.

Test Plan: ./validate

Reviewers: simonpj, erikd, hvr, goldfire, austin, bgamari

Subscribers: erikd, goldfire, thomie, mpickering

Differential Revision: https://phabricator.haskell.org/D1558

GHC Trac Issues: #11028

comment:46 Changed 3 years ago by bgamari

Resolution: fixed
Status: patchclosed

The commit above performed the refactoring described here.

There are still a few open questions regarding Haddock's handling of this. These will be addressed in due course.

Note: See TracTickets for help on using tickets.