Changes between Version 6 and Version 7 of Unicode


Ignore:
Timestamp:
Dec 3, 2005 4:38:21 PM (10 years ago)
Author:
ross@…
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Unicode

    v6 v7  
    4444If Unicode is allowed, should its use be restricted, e.g. to character and string literals?
    4545
    46 
    47 == Unicode in Haskell IO ==
    48 
    49  * All character based IO in jhc compiled programs is carried out in the current locale of the system
    50  * in nhc and ghc, character based IO is carried out as if it were latin1.
    51 
    5246== The Char type ==
    5347
    54 The Haskell 98 Report claims that the type `Char` represents Unicode. It goes on to provide I/O primitives using the `Char` type, define `FilePath` as `[Char]`, etc. Most implementations treat the octets interchanged with the operation system (file contents, filenames, program arguments and the environment) as characters, i.e. belonging to the Latin-1 subset. Hugs treats them as using a byte encoding of Unicode determined by the current locale, with the disadvantage that some byte strings may not be legal encodings.
     48The Haskell 98 Report claims that the type `Char` represents Unicode, which seems to be the canonical choice.
     49The functions of `Char` work with Unicode for GHC and Hugs, with one divergence from the Report:
     50 * `isAlpha` selects Unicode alphabetic characters, not just the union of lower- and upper-case letters.
    5551
    56 Using Unicode for `Char` seems the principled thing to do. If we retain it:
     52== I/O and System functions ==
     53
     54The Report goes on to provide I/O primitives using the `Char` type, define `FilePath` as `String`, and have the functions in `System` use `String`.
     55 * Hugs treats the bytes interchanged with the operation system (I/O, filenames, program arguments and the environment) as using a byte encoding of Unicode determined by the current locale, with the disadvantage that some byte strings may not be legal encodings.
     56 * All character based I/O in jhc-compiled programs uses the encoding of the current locale . Handling of strings will be similar when the CString functions become conformant.
     57 * Other implementations treat the bytes interchanged with the operation system as characters, i.e. belonging to the Latin-1 subset.
     58
     59Assuming we retain Unicode as the representation of `Char`:
    5760
    5861 * Flexible handling of character encodings is needed, but not necessarily as part of this standard.
    5962 * [wiki:BinaryIO] is needed anyway, and would provide a base for these encodings.
    60  * A simple character-based I/O interface like that in Haskell 98, possibly taking defaults from the locale, will also be convenient for many users.
     63 * A simple character-based I/O and system interface like that in Haskell 98, possibly taking defaults from the locale, will also be convenient for many users.
    6164 * An abstract type may be needed for data in O/S form, such as filenames, program arguments and the environment.