|Version 5 (modified by ivdsangen, 3 years ago) (diff)|
[Very incomplete. Please extend as you learn more.]
The parser is written using
- Alex, for lexical analysis. Source file compiler/parser/Lexer.x
- Happy, for the parser itself. Source file compiler/parser/Parser.y.pp. Note the .pp suffix; it is run through cpp to yield Parser.y.
- RdrHsSyn, for Haskell support functions. Source file compiler/parser/RdrHsSyn.lhs
Making a parser parse precisely the right language is hard. So GHC's parser follows the following principle:
- We often parse "over-generously", and filter out the bad cases later.
Here are some examples:
- Patterns are parsed as expressions, and transformed from HsExpr.HsExp into HsPat.HsPat in RdrHsSyn.checkPattern. An expression like [x | x<-xs] that doesn't look like a pattern is rejected by checkPattern.
- The context of a type is parsed as a type, and then converted into a context by RdrHsSyn.checkContext. For example, when parsing
f :: (Read a, Num a) => a -> athe parser can only discover that (Read a, Num a) is a context, rather than a type, when it meets the =>. That requires infinite lookahead. So instead we parse (Read a, Num a) as a tuple type, and then convert it to a context when we see the =>.
Sometimes the over-generous parsing is only dealt with by the renamer. For example:
- Infix operators are parsed as if they were all left-associative. The renamer uses the fixity declarations to re-associate the syntax tree.
There are plenty more examples. A good feature of this approach is that the error messages later in compilation tend to produce much more helpful error messages. Errors generated by the parser itself tend to say "Parse error on line X" and not much more.
The main point is this. If you are changing the parser, feel free to make it accept more programs than it does at the moment, provided you also add a later test that rejects the bad programs. Typically you need this flexibility if some new thing you want to add makes the pars ambiguous, and you need more context to disambiguate. Delicate hacking of the LR grammar is to be discouraged. It's very hard to maintain and debug.