Opened 7 years ago

Closed 6 years ago

#4430 closed feature request (fixed)

Better support for resolving infix expressions in template haskell

Reported by: reinerp Owned by: igloo
Priority: normal Milestone: 7.4.1
Component: Template Haskell Version: 6.12.3
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Consider writing a quasiquoter to parse haskell (for example, the parseHaskell quasiquoter mentioned in Part D of Simon's TH blog post

How is the quasiquoter supposed to handle infix expressions, such as

foo = [parseHaskell| 3 * 5 + 4 |]

In order to parse this expression properly, the quasiquoter needs to know the fixities of the operators mentioned, and this information is only available at the call site. We could try to use reify, but according to the description of "what reify sees" in the above-linked blog post, this would fail if, for example if the fixity is defined later in the file than the splice is done. Another obstacle to the reify approach is #4429

For context, the Hackage package haskell-src-meta provides a haskell parser which produces TH, similar to the parseHaskell quasiquoter discussed above. This package does not, in general, parse infix expressions correctly.

One option for supporting fixities better is to improve reify. In particular, reify could be made to return an operator's fixity even if typechecking hasn't been performed.

A second option would be to move responsibility of fixity parsing away from the quasiquoter and back to GHC, by adding the following constructor (or similar) to template haskell's Exp datatype:

...
 | UnresolvedInfixE [(Exp, Exp)] Exp
...

with the intention that UnresolvedInfixE [("a","+"), ("b", "*")] "c" (here I take strings to be expressions) denote the (unparenthesised) expression a + b * c with the expectation that GHC will apply the correct fixities when splicing.

Change History (37)

comment:1 Changed 7 years ago by simonpj

Very good point. GHC's own strategy is very like your second option. The parser parses all infix applications as left-associative (I think) with no priorities. Then the renamer re-associates everythign based on priorities.

All TH-generated code goes through the renamer so the existing operator-precedence re-association code will do the job nicely. So this looks to me like a more attractive route than expecting the quasi-quoter to do it, which is in general very hard. (Hard because even knowing which operator is meant by (***) pretty much means implementing a rename, dealing with imports, shadowing, and whatever.)

The renamer is not re-associating TH expressions at the moment, because when converting from TH syntax to HsSyn the conversion function adds HsPar parens around every operator application, and the renamer respects HsPar parens of course.

In short, I quite like your UnresolveInfixE solution. It's different to GHC's design, which has

data LHsExpr id 
  = ...
  | OpApp       (LHsExpr id)    -- left operand
                (LHsExpr id)    -- operator
                Fixity          -- Renamer adds fixity; bottom until then
                (LHsExpr id)    -- right operand

  | HsPar (LHsExpr id)
  ...

plus the expectation that all OpApp are unresolved until after renaming, except where guided by HsPar. But that's probably less appropraite for TH.

I'd welcome more ideas.

Simon

comment:2 Changed 7 years ago by reinerp

What is the status of this? I would be willing (and, I think, capable) to write a patch in the next few days to add the UnresolvedInfixE constructor, as I wrote it above, to Template Haskell. Would that be useful to do, or should I just wait for you to implement all of the "New directions in Template Haskell"?

Reiner

comment:3 Changed 7 years ago by simonpj

Reiner, I think it's quite orthogonal to all the "new directions" stuff, so do not wait for that. I think it'd be great if you could do it.

The first thing is to write down exactly what the proposal is. What can users expect? For example if they write

  x <- [| a + b * c |]

can will the TH.Exp bound to x be correctly associated? (Answer: yes.) What about this?

 x <- return (UnresolvedInfixE ... UnresolvedInfixE ...)

Answer, obviously no.

Is any arbitrary tree of UnresolvedInfixE applications ok? Eg

  UnresolvedInfixE 
    (UnresolvedInfixE e1 op1 e2)
    op2
    (UnresolveInfixE e3 op3 e4)

I think this should be OK, and should be correctly re-associated. But GHC's renamer currently expects operators to show up left-associated (because that's what the parser does). You'd need to look at the renamer to check it would do the Right Thing for arbitrary trees of UnresolvedInfixE.

What happens when there's an ordinary InfixE mixed in? (Answer: don't reassociate.)

Are sections allowed for UnresolvedInfixE? In InfixE the arguments are (Maybe Exp) to allow sections, but I think we don't what that here.

Anyway, you get the idea. You could use the Trac wiki to write out the design, so that it's of longer-term utility than an ephemeral email. Then send the proposal to the libraries list.

Meanwhile, you can get hacking; it won't be too hard. Thank you!

Simon

comment:4 Changed 7 years ago by reinerp

I have written up a design spec at UnresolvedInfixExpressions. I'm unsure about some parts of the design; in particular, see the last two sections.

comment:5 Changed 7 years ago by reinerp

Owner: set to reinerp

comment:6 Changed 7 years ago by simonpj

It all seems rather complicated to me. What's wrong with

data Exp = ...
  | UnresolvedInfixE 
         Exp       -- Left operand
         Exp       -- Operator
         Exp       -- Right operand

with the understanding that any tree of adjacent UnresolvedInfixE constructors is re-associated into correctly-associated uses of InfixE? The data type is simpler. The specification is simpler. The implementation is simpler.

Simon

comment:7 Changed 7 years ago by reinerp

That alone would not be sufficient. We would also need to add a Paren data-constructor, in order to distinguish the following two expressions:

[parseHaskell| (a * b + c) - d / e |]
[parseHaskell| a * b + c - d / e |]

(In the first, we want to reassociate within the brackets, and outside the brackets, but not across the brackets.)

For completeness, we would also like to support unresolved sections, such as

[parseHaskell| (a * b *) |]
[parseHaskell| (* a * b) |]

If * is left infix, then only the first of these two is valid; if it is right infix, then only the second of the two is valid.

So we might end up with

data Exp = ...
  | UnresolvedInfixE
       (Maybe Exp)    -- Left
       Exp            -- Operator
       (Maybe Exp)    -- Right
  | Paren Exp

with the understanding that any tree of adjacent UnresolvedInfixE constructors is reassociated (but of course we don't reassociate across parentheses). I could do that.

(My initial objection to this is simply that we're adding the Paren constructor, whose only purpose is to signal "don't reassociate UnresolvedInfixE trees across me" -- and then this leads me to try and shove the entire tree-to-be-reassociated into one constructor -- say, something like

data Exp = ...
  | UnresolvedInfixE
         [Exp]       -- operators, in order
         [Maybe Exp] -- operands, in order

I don't have a strong preference either way.)

comment:8 Changed 7 years ago by simonpj

Good point. I've thought about this a bit. I suggest we use the

data Exp = ...
  | UnresolvedInfixE
       Exp    -- Left
       Exp    -- Operator
       Exp    -- Right
  | Parens Exp
  ...

Reasoning

  • It's simple, direct, and understandable.
  • It is what GHC itself already does
  • It is easy on the chap constructing the code. A parser is only going to see one infix operator at a time, so building that list of operands (the alternative approach) would be awkward.
  • UnresolvedInfixE doesn't need to support sections, because InfixE can do the job. If you write
     InfixE
       Nothing
       op1
       (UnresolvedInfixE e1 op2 e2)
    

then there should be an error if the fixity of op1 and op2 don't relate correctly. To avoid this check, use Parens, or InfixE for the inner constructor. (Of course the docs should state this.)

I think that leads to a simple, understandable design.

comment:9 Changed 7 years ago by reinerp

That seems fair enough. Unless anything else comes up, I'll go ahead and do it this way.

One remaining point, which is probably easiest just to ignore for now, is: once the UnresolvedInfixE and Parens constructors are supported, the InfixE constructor is not strictly necessary (ignoring sections): any application

  InfixE left op right

can be rewritten as

  Parens (UnresolvedInfixE
            (Parens left) 
            op 
            (Parens right)
         )

It's just a bit odd.

comment:10 Changed 7 years ago by simonpj

Yes, that is true, but we can only resolve it by removing InfixE which would be more disruptive. In retrospect the new story might have been easier.

comment:11 Changed 7 years ago by igloo

Milestone: 7.2.1

comment:12 Changed 6 years ago by reinerp

Status: newpatch

I've attached some patches which implement UnresolvedInfixE and UnresolvedInfixP (exactly the same discussion applies to infix patterns as did to infix expressions, except we don't have to worry about sections). The approach is following the most recent plan discussed above.

The attached patch changes the behaviour of InfixP, as the previous behaviour seems almost certainly a bug. In particular, GHC would reassociate left-biased InfixP splices, (unlike InfixE splices, which it would leave alone). This actually broke round-tripping of pattern splices: for example, the pattern match in module B below currently succeeds:

{-# LANGUAGE TemplateHaskell #-}
module A where
import Language.Haskell.TH
import Language.Haskell.TH.Quote

data Tree = N
  | Tree :+ Tree

infixr :+

p1 = QuasiQuoter undefined (const [p| (N :+ N) :+ N |]) undefined undefined


{-# LANGUAGE QuasiQuotes #-}
module B where
import A

f = case N :+ (N :+ N) of
      [p1|unused|] -> True

The attached patch changes this so that InfixP trees are never reassociated, which is consistent with the behaviour of InfixE.

For completeness, I also considered adding unresolved infix constructors for types, but decided against it: there isn't currently an InfixT constructor after all, and TypeOperators is a language extension.

My patches add an unexpected failure (TH_pragma), the failure being the addition of some unnecessary parentheses in -ddump-splices.

Cheers, Reiner

Changed 6 years ago by reinerp

Attachment: compiler.patch added

Changed 6 years ago by reinerp

Attachment: template-haskell.patch added

Changed 6 years ago by reinerp

Attachment: testsuite.patch added

comment:13 Changed 6 years ago by reinerp

Owner: changed from reinerp to igloo

comment:14 Changed 6 years ago by simonpj

Looks good to me.

  • I think the name "UnresolvedInfixE" is rather long and clunky. I'd be inclined to say "UInfixE" and explain the terminology in the comment. What do you think?
  • In that comment you say "This special handling for sections is the exception to the previous point". I don't understand this comment. The use of InfixE in sections is not reassociated, just like any other use of InfixE. Seems fine to me!
  • In Convert.cvtOpAppP could you add a comment explaining why the chain must be re-associated. It's because GHC's renamer expects to see operator applications in a particular form; better say what that is. Are you sere that cvtOpAppP puts an arbitrary tree into a list form? I have not looked very carefully but I'm not sure it does.

Thanks

Simon

comment:15 Changed 6 years ago by reinerp

Thanks for reviewing.

  • Sure, I'll change the name.
  • I'll move the comment to TH/Syntax.hs. I'll adopt the style you linked to, but I'll leave the $infix #infix# in, because that inserts the note into Haddock and allows for hyperlinks as appropriate.
  • I agree it's not reassociated, but there is some form of special handling. Here's the problem. In Haskell, we have:
infixl +
(1 + 2 +)    -- valid, and parses as ((1 + 2) +)
(+ 1 + 2)    -- invalid
(+ (1 + 2))  -- valid

We need some way in Template Haskell to distinguish between the second and third cases. With my patch, we get the following:

InfixE Nothing plus (UInfixE 1 plus 2)           ----->  (+ 1 + 2)
InfixE Nothing plus (Parens $ UInfixE 1 plus 2)  ----->  (+ (1 + 2))

So the InfixE is not reassociated, but whether it causes an error depends on the UInfixE inside it (and in Convert.cvtl we see fully-applied InfixE constructors put parentheses around their arguments, whereas InfixE sections don't). So it's somewhat against the spirit of "GHC won't play any tricks when you have an InfixE", and should be documented somehow, even if it's not an exception.

  • Yes, I was wondering about how much I should document. I could add a brief comment explaining how cvtOpAppP works, along the following lines.
We can see that @cvtOpAppP@ is correct as follows. The inductive hypothesis
is that @cvtOpAppP x op y@ is left-biased, provided @x@ is. It is clear that
this holds for both branches (of @cvtOpAppP@), provided we assume it holds for
the recursive calls to @cvtOpAppP@.

I originally had a simpler (more obviously correct) implementation, but in the worst case (starting with a completely right-biased tree) it had quadratic complexity in the size of the input tree, whereas the current implementation is linear.

Cheers, Reiner

comment:16 Changed 6 years ago by simonpj

Re bullet 3, I see now. (Please put that in the big comment, or whatever we agree.)

Perhaps we should just say that

InfixE Nothing plus (UInfixE 1 plus 2)           ----->  (+ (1 + 2))

There is only one (legal) meaning for what the data structre represents, so let's just pick that. That's what I thought the idea was, and it's why I was confused. Why generate a possibly bogus term if there's an obvious legal meaning?

After all we have that

InfixE 0 * (UInfixE 1 + 2)           ----->  (0 * (1 + 2))

(that is, we parenthesise the arguments of InfixE regardless of their form) so it seems entirely consistent for sections too.

Re last bullet:

  • I think it would be worth documenting the left-biased form that GHC's renamer expects. There is a bit in RnTypes.lhs line 280 or so, but it lacks a Note [blah] heading -- you could add that and expand the note.
  • I was thinking of a comment to say "this is why cvtOpAppP has to reassociate the tree. Documenting the implementation (to convince oneself that it does indeed do so) is a separate matter. Your comment on that looks fine; except we need to know that the return value of cvtl satisfies the invariant, since that supplies the initial arg to cvtOpAppP.

comment:17 Changed 6 years ago by reinerp

Ok, I agree that

InfixE Nothing plus (UInfixE 1 plus 2)   -----> (+ (1 + 2))

is the right thing to do. But we need a representation for (+ 1 + 2). I think we should use

data Exp =
...
  UInfixE (Maybe Exp) Exp (Maybe 
Exp)
...

Using InfixE just doesn't work.

comment:18 Changed 6 years ago by simonpj

Sorry, don't understand. Why do we need a rep for an illegal syntax tree?

comment:19 Changed 6 years ago by reinerp

Because our hypothetical quasiquoter doesn't know that it's illegal (you need to know the operator fixities for that). And even though there is at most one legal way to resolve the expression (1 + 2 +), namely to ((1 + 2) +), it would not be acceptable for our quasiquoter to do this conversion:

  • firstly, this turns some illegal haskell expressions into legal ones, which in itself is not a huge problem
  • more importantly, some (illegal, but plausible) expressions will be given surprising meanings. For instance, consider (* 2 + 1). It is illegal in haskell, but one might think it should mean (\x -> x * 2 + 1). However, the above conversion would turn it into (* (2 + 1)), which is completely different.

comment:20 Changed 6 years ago by simonpj

Actually, now I look at it, if you check HsExpr.rnSection you'll see that (in HsSyn),

SectionR + (OpApp a + b)

is rejected (by checkSectionPrec). So, the representation for (+ 1 + 2) can simply generate the above, as it will, and then GHC will reject it, as it should.

In short, no need for UInfixE.

comment:21 Changed 6 years ago by reinerp

I'm confused. My patch implements the following:

InfixE Nothing + (UInfixE a + b)            ---> SectionR + (OpApp a + b)
InfixE Nothing + (ParensE $ UInfixE a + b)  ---> SectionR + (HsPar $ OpApp a + b)

This means that we can use the first to represent (+ a + b) and the second to represent (+ (a + b)). All good, we can represent the things we need to. The problem is that InfixE behaves quite differently for sections than it does for non-sections (the special handling I discussed in commment 15).

It seems there are two ways to proceed:

  1. Leave the situation with sections as implemented in my patch, so UInfixE doesn't need to worry about sections, at the cost of some special cases in InfixE
  2. Remove the special cases of InfixE by making the UInfixE constructor handle sections:
    data Exp =
    ...
       UInfixE (Maybe Exp) Exp (Maybe Exp)
    ...
    

Which are you suggesting?

comment:22 Changed 6 years ago by reinerp

By the way, I now think the second way is the right way to proceed. It doesn't have special cases or surprising behaviour.

comment:23 Changed 6 years ago by simonpj

I'm confused too! I see no special cases for InfixE in your patch in Convert (which converted TH syn to Hs syn. Can you point me to where it is (file/line number)?

My only concern was your comment, where you mention a special case that I cannot see.

The actual code looks fine to me! So I currently espouse option 1 on the grounds that it's simpler, and does everything we want.

Sorry to be dense

Simon

comment:24 Changed 6 years ago by reinerp

Sorry, I guess I didn't explain my thoughts clearly. I'll give it another shot.

What is the special case

The special case is not explicit, but it's there. In Convert.cvtl, compare the cases for InfixE (Just x) s (Just y) and InfixE Nothing s (Just y) (this is around line 488). We produce the following output:

InfixE (Just x) s (Just y)   ---> HsPar (OpApp (HsPar x) s (HsPar y))
InfixE Nothing  s (Just y)   ---> HsPar (SectionR        s y)

Note that the former puts parentheses around the y, whereas the latter doesn't. This means that when the renamer runs, the OpApps coming from y will meet the SectionR in the second case, but they won't meet the OpApp in the first case. That is:

InfixE (Just x) s (Just (UInfix y op z))
  ---> HsPar (OpApp
                (HsPar x)
                s
                (HsPar (OpApp y op z)))
  (the OpApps don't meet, so they aren't reassociated)

InfixE Nothing s (Just (UInfix y op z))
  ---> HsPar (SectionR
               s
               (OpApp y op z))
  (the OpApp meets the SectionR, so the renamer will throw an error if @op@ is left-infix)

Of course, the same applies the the SectionL case.

Option 1, a summary

For comparison with the next section, here is what Option 1 does:

data Exp =
...
  | InfixE (Maybe Exp) Exp (Maybe Exp)
  | UInfixE Exp Exp Exp
...

and Convert.cvtl does this:

InfixE (Just x) s (Just y)   ---> HsPar (OpApp    (HsPar x) s (HsPar y))
InfixE Nothing  s (Just y)   ---> HsPar (SectionR           s y        )
InfixE (Just x) s Nothing    ---> HsPar (SectionL x s                  )
UInfixE  x      s y          --->       (OpApp    x         s y        )

What Option 2 would look like

In Option 2, we would have

data Exp =
...
 | InfixE (Maybe Exp) Exp (Maybe Exp)
 | UInfixE (Maybe Exp) Exp (Maybe Exp)
...

and the relevant parts of Convert.cvtl would be

InfixE (Just x) s (Just y)   ---> HsPar (OpApp    (HsPar x) s (HsPar y))
InfixE Nothing  s (Just y)   ---> HsPar (SectionR           s (HsPar y))
InfixE (Just x) s Nothing    ---> HsPar (SectionL (HsPar x) s          )
UInfixE (Just x) s (Just y)  --->       (OpApp    x         s y        )
UInfixE Nothing  s (Just y)  --->       (SectionR           s y        )
UInfixE (Just x) s Nothing   --->       (SectionL x         s          )

Why option 1 is surprising

Perhaps the comment I wrote was too strongly worded in saying that there was a special case. But Option 1 is still surprising.

Consider:

-- with option 1
InfixE (Just x) s (Just y)          -- doesn't care if x and y are UInfixE's, i.e. no fixity errors, no reassociating
InfixE Nothing  s (Just y)          -- if y is a UInfixE with a left-infix operator, the renamer will throw a fixity error
InfixE Nothing  s (Just (Parens y)) -- doesn't care if y is a UInfixE
-- with option 2
InfixE  (Just x) s (Just y)         -- doesn't care if x and y are UInfixE's
InfixE  Nothing  s (Just y)         -- doesn't care if y is a UInfixE
UInfixE Nothing  s (Just y)         -- if y is a UInfixE with a left-infix operator, the renamer will throw a fixity error

I find option 1 surprising, because I don't expect InfixE to "look inside" its argument. And for non-section uses of InfixE, it doesn't, but for sections it does! (I can't find a better description for what happens in the above examples than "doesn't care" versus "looks inside", which is unfortunate, and perhaps a source of the confusion. I hope it's clear what I mean.)

Sorry for dragging things out so long.

Reiner

comment:25 Changed 6 years ago by simonpj

OK now I understand.

  • TH could have had three constructors (for left section, right section, and infix) like GHC, but it didn't. But I think of the three cases as three separate constructors.
  • Once you think like that, there is no more "looking inside".

Bottom line: I'd prefer to stick with Option 1. Could you do the other things we agree and send final patches? Thanks!

comment:26 Changed 6 years ago by reinerp

Ok, I've attached new patches. I noticed a bug in the old patches, which I've fixed and documented under Note [Dropping constructors].

Reiner

comment:27 Changed 6 years ago by simonpj

Great. Can you say what the point of testsuite-0002-Accept-redundant-parens-in-TH_pragma.patch is? It seems to modify an existing test. Does the test not work without the mod? Or what?

comment:28 Changed 6 years ago by reinerp

It's as you say: the test fails without this patch.

The only changes I make are the addition of some (redundant) parentheses in the output of -ddump-splices. The cause for these extra parentheses is clear: in the InfixE case of Convert.cvtl, I added some extra HsPar constructors. In most cases, these extra parentheses are redundant, but I found it easiest simply to always add them.

comment:29 Changed 6 years ago by simonpj@…

commit 921b1b3286d95fccb03ec6c31e8abd02fde1bab9

Author: Simon Peyton Jones <simonpj@microsoft.com>
Date:   Wed Jul 27 06:21:43 2011 +0100

    A bit of refactoring on handling HsPar and friends
    
    This relates to Trac #4430 (infix expressions in TH),.
    Mainly comments but a bit of code wibbling.

 compiler/hsSyn/Convert.lhs |   85 +++++++++++++++++++++++++++-----------------
 compiler/hsSyn/HsExpr.lhs  |   77 ++++++++++++++++++++++++++-------------
 compiler/hsSyn/HsPat.lhs   |   39 ++++++++------------
 compiler/hsSyn/HsTypes.lhs |    6 +---
 compiler/hsSyn/HsUtils.lhs |   60 +++++++++++++++++++------------
 5 files changed, 157 insertions(+), 110 deletions(-)

comment:30 Changed 6 years ago by simonpj

Resolution: fixed
Status: patchclosed

Plus

commit 54d7c6beb2d2c6ec6c7b46f5f60935c162045d93
Author: Reiner Pope <reiner.pope@gmail.com>
Date:   Sat Jul 23 16:15:41 2011 +1000

    Add support for unresolved infix expressions and patterns

 compiler/hsSyn/Convert.lhs |  106 +++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 99 insertions(+), 7 deletions(-)

and in the template-haskell library

commit 2539bc6b3a5b0e4c6ce12746bfefe709026cac2d
Author: Reiner Pope <reiner.pope@gmail.com>
Date:   Sat Jul 23 16:13:17 2011 +1000

    Add support for unresolved infix expressions and patterns.

 Language/Haskell/TH.hs        |    6 ++-
 Language/Haskell/TH/Lib.hs    |   15 ++++++++
 Language/Haskell/TH/Ppr.hs    |   17 ++++++++--
 Language/Haskell/TH/Syntax.hs |   75 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 108 insertions(+), 5 deletions(-)

All done, I think. Thanks reinerp!

Simon

Note: See TracTickets for help on using tickets.