Changes between Version 6 and Version 7 of Unicode


Ignore:
Timestamp:
Dec 3, 2005 4:38:21 PM (8 years ago)
Author:
ross@…
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Unicode

    v6 v7  
    4444If Unicode is allowed, should its use be restricted, e.g. to character and string literals? 
    4545 
    46  
    47 == Unicode in Haskell IO == 
    48  
    49  * All character based IO in jhc compiled programs is carried out in the current locale of the system 
    50  * in nhc and ghc, character based IO is carried out as if it were latin1. 
    51  
    5246== The Char type == 
    5347 
    54 The Haskell 98 Report claims that the type `Char` represents Unicode. It goes on to provide I/O primitives using the `Char` type, define `FilePath` as `[Char]`, etc. Most implementations treat the octets interchanged with the operation system (file contents, filenames, program arguments and the environment) as characters, i.e. belonging to the Latin-1 subset. Hugs treats them as using a byte encoding of Unicode determined by the current locale, with the disadvantage that some byte strings may not be legal encodings. 
     48The Haskell 98 Report claims that the type `Char` represents Unicode, which seems to be the canonical choice. 
     49The functions of `Char` work with Unicode for GHC and Hugs, with one divergence from the Report: 
     50 * `isAlpha` selects Unicode alphabetic characters, not just the union of lower- and upper-case letters. 
    5551 
    56 Using Unicode for `Char` seems the principled thing to do. If we retain it: 
     52== I/O and System functions == 
     53 
     54The Report goes on to provide I/O primitives using the `Char` type, define `FilePath` as `String`, and have the functions in `System` use `String`. 
     55 * Hugs treats the bytes interchanged with the operation system (I/O, filenames, program arguments and the environment) as using a byte encoding of Unicode determined by the current locale, with the disadvantage that some byte strings may not be legal encodings. 
     56 * All character based I/O in jhc-compiled programs uses the encoding of the current locale . Handling of strings will be similar when the CString functions become conformant. 
     57 * Other implementations treat the bytes interchanged with the operation system as characters, i.e. belonging to the Latin-1 subset. 
     58 
     59Assuming we retain Unicode as the representation of `Char`: 
    5760 
    5861 * Flexible handling of character encodings is needed, but not necessarily as part of this standard. 
    5962 * [wiki:BinaryIO] is needed anyway, and would provide a base for these encodings. 
    60  * A simple character-based I/O interface like that in Haskell 98, possibly taking defaults from the locale, will also be convenient for many users. 
     63 * A simple character-based I/O and system interface like that in Haskell 98, possibly taking defaults from the locale, will also be convenient for many users. 
    6164 * An abstract type may be needed for data in O/S form, such as filenames, program arguments and the environment.