Changes between Version 4 and Version 5 of Unicode


Ignore:
Timestamp:
Dec 3, 2005 1:43:33 PM (8 years ago)
Author:
ross@…
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Unicode

    v4 v5  
    3030 
    3131The Haskell 98 Report claims that Haskell source code uses the Unicode character set. 
    32 Jhc is the only implementation that allows unrestricted use of the Unicode character set in Haskell source. Most treat source code as Latin-1 (except jhc which treats it as utf8). If Unicode were allowed, how would implementations know which encoding was used? 
     32If Unicode were allowed, how would implementations know which encoding was used? 
     33 * Jhc is the only implementation that allows unrestricted use of the Unicode character set in Haskell source, treating input as UTF-8. 
     34 * Hugs treats input as being in the encoding specified by the current locale, but permits Unicode only in comments and character and string literals. 
     35 * Others treat source code as Latin-1. 
    3336 
    3437Some things we could do: 
     
    3639 * Revert to US-ASCII, Latin-1 or implementation-defined character sets. 
    3740 * Allow Unicode, defining a portable form (the \uNNNN escapes in Haskell 1.4 were an attempt at this). 
    38  * Allow Unicode, with a mechanism for specifying encoding. 
    39  * Allow Unicode only in some places, e.g. character and string literals. 
    40  * Use the locale standard on the system. this is arguably the correct thing all progams that read text files should always do. (we could specify that all compilers must support utf8 too) 
     41 * Allow Unicode, with a mechanism for specifying encoding in the source file. 
     42 * Allow Unicode with the encoding specified by the current locale (as currently done by Hugs). This is arguably the correct thing for all programs that read text files, but it makes Haskell source using non-ASCII characters non-portable. (We could specify that all compilers must support UTF-8 and/or some other portable form too.) 
     43 
     44If Unicode is allowed, should its use be restricted, e.g. to character and string literals? 
    4145 
    4246== The Char type ==