Changes between Version 1 and Version 2 of UnicodeInHaskellSource
- Jan 25, 2006 2:10:27 PM (9 years ago)
v1 v2 2 2 3 3 The Haskell 98 Report ([http://www.haskell.org/onlinereport/lexemes.html Lexical Structure]) claims that Haskell source code uses the [wiki:Unicode] character set. 4 5 6 4 7 Haskell source code is stored in text files using various character sets and encodings. 5 If Unicode were allowed, how would implementations know which encoding was used? 8 6 9 * Jhc allows unrestricted use of the Unicode character set in Haskell source, treating input as UTF-8. Several uses of Unicode characters in place of Haskell keywords are permitted: 7 10 * '→' ('\x2192') is equivalent to '->' … … 14 17 In addition there is experimental support for defining new operators and names using various Unicode characters. 15 18 * Hugs treats input as being in the encoding specified by the current locale, but permits Unicode only in comments and character and string literals. 19 16 20 * Others treat source code as ISO 8858-1 (Latin-1). 17 21 18 Some things we could do: 22 == Problems with Unicode in Haskell 98 == 23 24 There are plenty of Unicode alphabetic characters which are neither upper, lower, or title case, and hence are not allowed in identifiers. Some languages have no notion of case at all. Since Haskell's syntax relies on case for distinguishing constructors and variables, what should our position be with respect to caseless character sets? 25 26 The report should at least be absolutely clear about which Unicode character properties (N, Ll, Lu, Sm, etc.) correspond to which lexical class in the syntax. 27 28 == Some things we could do == 19 29 * Revert to US-ASCII, Latin-1 or implementation-defined character sets. 20 30 * Allow Unicode with the encoding specified outside source files (e.g. by the current locale, as currently done by Hugs). This would make Haskell source containing non-ASCII characters non-portable.