Chaos in Lexeme.hs
I've been looking at the Lexeme
module (in basicTypes
), where -- as far as I can tell -- utter chaos reigns. (Full disclosure: I wrote this module some time ago, inheriting its code from various places. But I clearly did a poor job of it.) Here is a sampling of the chaos:
-
isLexConSym
claims to recognize type and data constructor infix symbols. But it requires symbols to start with a:
(or be->
). This is out-of-date with respect to the change in type constructor infix symbols in 7.6(?), which now do not need to start with a:
. -
isVarSymChar
andokSymChar
both purport to recognize characters that are valid parts of symbolic identifiers. But they have entirely different, unrelated implementations. These should be the same function, I believe. - The
notFollowedBySymbol
function defined inparser/Lexer.x
overlaps with the functions above. But it has a third implementation, different than either of these other two. - The
isLexXXX
functions all just look at first characters, except forisLexVarSym
, which looks at all characters. There is a reason for this -- that GHC-generated names start with a$
but should be printed prefix -- but I'm not sure I buy it. Is it sufficient to look at the first two characters instead of the first one?
I'm happy to make the code changes around this, but I need some advice from someone who has more knowledge about both Haskell's lexical structure and quite possibly Unicode.
Happily, the function in Lexeme
are not used much. But it would be awfully nice if they did the right thing when they are used.
Trac metadata
Trac field | Value |
---|---|
Version | 7.10.1 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Compiler |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |