wiki:FixityResolution

Version 11 (modified by simonmar@…, 5 years ago) (diff)

--

Proposal: FixityResolution


Ticket #30
Dependencies
Related NegativeSyntax

Compiler support

GHC full (no flag)
nhc98 full (no flag)
Hugs full (no flag)
UHC full (no flag)
JHC full (no flag)
LHC full (no flag)

Summary

Change the way fixity resolution is specified in the Report. Remove fixity resolution from the context-free grammar, and specify it separately. This does change the syntax slightly by eliminating some impossible-to-parse-correctly cases, but in practice all existing compilers already implement the proposed syntax, not the current syntax.

Description

This is a proposal that doesn't affect anything except the presentation of the report, because all Haskell implementations currently do it this way anyway. There are some legal Haskell programs, according to the report, that aren't accepted by current implementations (see below). I propose we make the language specification match the implementations.

The Problem

The Haskell 98 context-free syntax includes fixity resolution as part of the grammar, with a fixed number of fixities ([1..9]), essentially using macro expansion to define the grammar. Apart from being really ugly, and not corresponding to any known implementation (long ago GHC used a yacc parser formulated like this, but not any more), this gives rise to some constructions that are very hard to parse correctly. For example, the expression:

  \x -> x == x == True

should be legal, and parse as (\x -> x == x) == True. The reason is that == is nonfix (assuming the == in scope is from the Prelude), and this combined with the rule that says "lambda expressions extend as far to the right as possible", forces the lambda expression to end before the second ==.

No known Haskell compiler actually parses this correctly. Admittedly it's a type error, but that's not the point - the grammar should be parsable.

Furthermore:

  do a == b == c

This is legal syntax, and parses as do { a == b } == c, again because == is nonfix, and the second == closes the layout context by virtue of the layout rule's parse error condition.

As far as I know, no Haskell compiler gets this right.

The Solution

These constructions rely on the parser being aware of operator fixities during parsing, which is an unreasonable restriction - it's often easier to follow imports after parsing a file, rather than during it.

Hence, I propose that we remove operator fixity resolution from the context-free grammar, and specify it separately. The context-free grammar would parse applications of infix operators as a flat list to be resolved separately. The examples above would parse but fail during fixity resolution.

This would also let us expand the current paultry set of 9 fixity levels to an arbitrary limit, if we so wished (but that introduces backwards compatibility issues).

Sample code for resolving fixities is here: resolve.hs. I believe this implements Haskell 98 fixity resolution, without prefix negation (prefix negation could be added, but we might not need to; see NegativeSyntax). The core of the parser is 12 lines of code, with a few lines of datatype declarations and pretty-printing. Note that there is no upper limit on precedence levels, but there is a lower limit of zero. This code could serve as the basis for specifying fixity resolution in the Haskell' report.

References

Report Delta

Section 3 (Expressions)

remove

In the syntax that follows, there are some families of nonterminals indexed by precedence levels (written as a superscript). Similarly, the nonterminals op, varop, and conop may have a double index: a letter l, r, or n for left-, right- or non-associativity and a precedence level. A precedence-level variable i ranges from 0 to 9; an associativity variable a varies over {l, r, n}. For example

  aexp 	-> 	( expi+1 qop(a,i) )

actually stands for 30 productions, with 10 substitutions for i and 3 for a.

replace

  exp    -> exp0 :: [context =>] type 	 (expression type signature)
	  | exp0

  expi   -> expi+1 [qop(n,i) expi+1]
          | lexpi
          | rexpi
  lexpi  -> (lexpi | expi+1) qop(l,i) expi+1
  lexp6  -> - exp7
  rexpi  -> expi+1 qop(r,i) (rexpi | expi+1) 

with

  exp      -> infixexp :: [context =>] type 	 (expression type signature)
	    | infixexp

  infixexp -> exp10 qop infixexp
            | - infixexp
            | exp10

replace

  | ( expi+1 qop(a,i) ) 	 (left section)
  | ( lexpi qop(l,i) ) 	         (left section)
  | ( qop(a,i)<-> expi+1 ) 	(right section)
  | ( qop(r,i)<-> rexpi ) 	(right section) 

with

  | ( infixexp qop ) 	 (left section)
  | ( qop<-> infixexp )	 (right section)

replace

Expressions involving infix operators are disambiguated by the operator's fixity (see Section 4.4.2). Consecutive unparenthesized operators with the same precedence must both be either left or right associative to avoid a syntax error. Given an unparenthesized expression "x qop(a,i) y qop(b,j) z", parentheses must be added around either "x qop(a,i) y" or "y qop(b,j) z" when i=j unless a=b=l or a=b=r.

with

Expressions involving infix operators are disambiguated by the operator's fixity (see Section 4.4.2). Consecutive unparenthesized operators with the same precedence must both be either left or right associative to avoid a syntax error. Given an unparenthesized expression "x qop(a,i) y qop(b,j) z" (where "qop(a,i)" means an operator with associativity "a" and precedence "i"), parentheses must be added around either "x qop(a,i) y" or "y qop(b,j) z" when i=j unless a=b=l or a=b=r.

An example algorithm for resolving expressions involving infix operators is given in Section 9.6.

remove

A note about parsing. Expressions that involve the interaction of fixities with the let/lambda meta-rule may be hard to parse. For example, the expression

    let x = True in x == x == True

cannot possibly mean

    let x = True in (x == x == True)

because (==) is a non-associative operator; so the expression must parse thus:

(let x = True in (x == x)) == True

However, implementations may well use a post-parsing pass to deal with fixities, so they may well incorrectly deliver the former parse. Programmers are advised to avoid constructs whose parsing involves an interaction of (lack of) associativity with the let/lambda meta-rule.

replace

For the sake of clarity, the rest of this section shows the syntax of expressions without their precedences.

with

For the sake of clarity, the rest of this section will assume that expressions involving infix operators have been resolved according to the fixities of the operators.

Section 3.5 (Sections)

replace the grammar, as above. The text is still correct.

Section 9 (Syntax reference)

Make the corresponding changes as for Section 3.

Section 9.6, Resolving expressions with infix operators

New sub-section

The following is an example implementation of fixity resolution for Haskell expressions. The function @resolve@ takes a list consisting of alternating expressions and operators (an instance of the @infixexp@ non-terminal in the context-free grammar), and returns either @Just e@ where @e@ is the resolved expression, or @Nothing@ if the input does not represent a valid expression.

data Op     = Op String Prec Fixity                deriving Eq
data Fixity = Leftfix | Rightfix | Nonfix          deriving Eq
type Prec   = Int

data Exp    = Var Var | OpApp Exp Op Exp | Neg Exp deriving Eq
type Var    = String

data Tok    = TExp Exp | TOp Op | TNeg
   
resolve :: [Tok] -> Maybe Exp
resolve tokens = fmap fst $ parseNeg (Op "" (-1) Nonfix) tokens
  where
    parseNeg :: Op -> [Tok] -> Maybe (Exp,[Tok])
    parseNeg op1 (TExp e1 : rest) = parse op1 e1 rest
    parseNeg op1 (TNeg : rest)
       = do guard (prec1 < 6)
            (r, rest') <- parseNeg (Op "-" 6 Leftfix) rest
            parse op1 (Neg r) rest'
       where
          Op _ prec1 fix1 = op1

    parse :: Op -> Exp -> [Tok] -> Maybe (Exp, [Tok])
    parse _   e1 [] = Just (e1, [])
    parse op1 e1 (TOp op2 : rest) 
       -- case (1): check for illegal expressions
       | prec1 == prec2 && (fix1 /= fix2 || fix1 == Nonfix)
       = Nothing

       -- case (2): op1 and op2 should associate to the left
       | prec1 > prec2 || (prec1 == prec2 && fix1 == Leftfix)
       = Just (e1, TOp op2 : rest)

       -- case (3): op1 and op2 should associate to the right
       | otherwise
       = do (r,rest') <- parseNeg op2 rest
            parse op1 (OpApp e1 op2 r) rest'
       where
         Op _ prec1 fix1 = op1
         Op _ prec2 fix2 = op2

The algorithm works as follows. At each stage we have a call

      parse op1 E1 (op2 : tokens)

which means that we are looking at an expression like

      E0 `op1` E1 `op2` ...  (*1)

(the caller holds E0). The job of @parse@ is to build the expression to the right of @op1@, returning the expression and any remaining input.

There are three cases to consider:

(1) if @op1@ and @op2@ have the same precedence, but they do not have the same associativity, or they are declared to be nonfix, then the expression is illegal.

(2) If @op1@ has a higher precedence than @op2@, or @op1@ and @op2@ should left-associate, then we know that the expression to the right of @op1@ is @E1@, so we return this to the caller.

(3) Otherwise, we know we want to build an expression of the form @E1 op2 R@. To find R, we call @parseNeg op2 tokens@ to compute the expression to the right of @op2@, namely @R@ (more about @parseNeg@ below, but essentially if @tokens@ is of the form @(E2 : rest)@, then this is equivalent to @parse op2 E2 rest@). Now, we have

E0 op1 (E1 op2 R) op3 ...

where @op3@ is the next operator in the input. This is an instance of (*1) above, so to continue we call parse, with the new E1 == (E1 op2 R)

To initialise the algorithm, we set @op1@ to be an imaginary operator with precedence lower than anyything else. Hence @parse@ will consume the whole input, and return the resulting expression.

The handling of the prefix negation operator, @-@, complicates matters only slightly. Recall that prefix negation has the same fixity as infix negation: left-associative with precedence 6. The operator to the left of @-@, if there is one, must have precedence lower than 6 for the expression to be legal. The negation operator itself may left-associate with operators of the same fixity (e.g. @+@). So for example @-a + b@ is legal and resolves as @(-a) + b), but @a + -b@ is illegal.

The function @parseNeg@ handles prefix negation. If we encounter a negation operator, and it is legal in this position (the operator to the left has precedence lower than 6), then we proceed in a similar way to case (3) above: compute the argument to '-' by recursively calling @parseNeg@, and then continue by calling @parse@.

Note that this algorithm is insensitive to the range of precedences. There is no reason in principle that Haskell should be limited to integral precedences in the range 1 to 10; a larger range, or fractional values, would present no additional implementation difficulties.

Attachments (2)

Download all attachments as: .zip