# Changes between Initial Version and Version 3 of Ticket #1544

Ignore:
Timestamp:
Aug 20, 2007 9:30:21 PM (10 years ago)
Comment:

Here is the story as far as I can work it out. I don't have a solution yet, but this records my diagnosis. I've asked Koen Claessen (who originated the clever parsing technology) for help. Here is his paper which describes it: http://www.cs.chalmers.se/~koen/pubs/entry-jfp04-parser.html

For a data type like

```data Exp = C | Exp :+: Exp
```

we get a parser like this:

```   readPrec = parens
((do Ident "C" <- lexP
return C)
+++
(prec 9 \$
Symbol ":+:" <- lexP
return (a :+: b ))
))
```

The trouble comes because when we see a paren we don't know whether it encloses the whole expression or just part

```	(C :+: C :+: C)
```

or

```	(C :+: C) :+: C
```

So the `a<-readPrec` line expands to a new unfolding of `readPrec`. We get two parsers working in parallel, one expecting the closing paren at the end,

```	(t1)
```

where `t1` is a term, and one expecting a term of form

```	(t1) :+: t2
```

(There are two more parsers, each expecting a plain C, but I'll ignore those. Also in case you are wondering, the precedence mechanism cuts off the left-recursion problem.)

So far so good. But when we consume a paren, each of these two parsers generate two new parsers in exactly the same way. So after consuming one paren we have FOUR parsers expecting terms of form

```	((t1))
((t1) :+: t2)
((t1)) :+: t2
((t1) :+: t2) :+: t3
```

This is clearly unacceptable, because after `n` parens we get `2^n` parsers. At least that is my theory, and it certainly would explain what is going on.

I have now forgotten why I originally adopted Koen's parallel parser idea! I'm sure it's a good one. But this does seem like a serious problem. Remember that we must generate a parser for data type `T` without looking at the parsers for any other data types.

Furthermore, although I said I'd ignore it, we do get two (or more) parsers each independently parsing the same lexeme.

```	(do { Ident "C" <- lexP; ...} +++ do { Ident "C" <- lexP; ...})
```

They get expanded out, and then dynamically recombined by the parser machinery, but it seems like a lot of wasted work is done. But I don’t think that's what's giving the problem here.

I attach a complete program `ParseInfix.hs` that demonstrates the problem, with a hand-written read instance so that you don't have to guess what `deriving` is doing. Just compile and run.

Simon

### Legend:

Unmodified
• Property Milestone changed from to `6.8`