Opened 3 years ago

Last modified 3 years ago

#11012 new feature request

Support for unicode primes on identifiers.

Reported by: ghartshaw Owned by:
Priority: low Milestone:
Component: Compiler (Parser) Version: 7.10.2
Keywords: unicode, report-impact Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

GHC should allow the (single/double/triple/quadruple) prime characters in Unicode to be allowed in identifiers. This would make them consistent with the ASCII apostrophe, which is usually used in place of a single prime. The current workaround for primes (using one or more apostrophes) is unwieldy for higher primes (e.g. a''' and a'''').

All of the following identifiers should be valid.

a'   // U+0027 APOSTROPHE
a′   // U+2032 PRIME
a″   // U+2033 DOUBLE PRIME
a‴   // U+2034 TRIPLE PRIME
a⁗   // U+2057 QUADRUPLE PRIME

Change History (3)

comment:1 Changed 3 years ago by nomeata

Hi ghartshaw,

I’m very sympathetic towards making good use of Unicode. On the other hand, it is also important to follow the language specification, and that is pretty clear about what Unicode symbols are allowed as part of identifiers, etc.

Currently, these primes are “Other Punctuation” according to the Unicode standard, any maybe someone uses them as such, e.g. as an operator.

Would you mind precisely stating what modification you’d like to make to https://www.haskell.org/report/mono/2010#x7-160002.2

This is a bit related to #10196, where we deviated from the report to allow subscripted symbols, so there is precedent.

comment:2 Changed 3 years ago by ghartshaw

I would like the characters U+2032, U+2033, U+2034, and U+2057 to be excluded from uniSymbol in the rule for symbol, added to the rule for graphic, and added to the rules for varid and conid.

These rules currently are

graphic → small | large | symbol | digit | special | " | '
symbol → ascSymbol | uniSymbol<special | _ | " | '>
varid → (small {small | large | digit | '})<reservedid>
conid → large {small | large | digit | '}

With this proposal they would become

prime → ' | ′ | ″ | ‴ | ⁗       // U+0027,U+2032,U+2033,U+2034,U+2057
graphic → small | large | symbol | digit | special | " | prime
symbol → ascSymbol | uniSymbol<special | _ | " | prime>
varid → (small {small | large | digit | prime})<reservedid>
conid → large {small | large | digit | prime}

These rules treat the Unicode primes exactly the same as an apostrophe when found in identifiers.

comment:3 Changed 3 years ago by thomie

Keywords: unicode report-impact added
Note: See TracTickets for help on using tickets.