Changes between Version 1 and Version 2 of Unicode


Ignore:
Timestamp:
Dec 3, 2005 9:50:57 AM (8 years ago)
Author:
malcolm.wallace@…
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Unicode

    v1 v2  
    2323      'page-switch' characters that swap out the current 'page' of the code book for a different one. So although most characters 
    2424      end up fitting in a single 16-bit field, some must be coded as two successive fields. 
    25     * UTF-32 uses a full 32-bit word per character glyph. 
    26     * To make things more exciting, the UTF-16 and UTF-32 encodings have two variations, depending on the endianness of 
     25    * UCS-4 uses a full 32-bit word per character glyph. 
     26    * To make things more exciting, the UTF-16 and UCS-4 encodings have two variations, depending on the endianness of 
    2727      the machine they were originally written on.  So if you read a raw byte-stream and want to convert it to 16-bit chunks, 
    2828      you first need to work out the byte-ordering.  This is often done by reading a few bytes and then looking up a heuristic table, 
     
    3131  * Since unix-like systems traditionally deal with byte-streams, UTF-8 is the most common encoding on those platforms. 
    3232  * The NTFS file system (Windows) stores filenames and file contents in UTF-16. 
    33   * Almost no system stores UTF-32 natively, but there is a C library type 'wchar' (wide character) which has 32 bits. 
     33  * Almost no system stores UCS-4 natively, but there is a C library type 'wchar' (wide character) which has 32 bits. 
    3434  * But of course any system must be able to read/write files that originated on any other platform. 
    3535  * As an example of the complex heuristics needed to guess the encoding of any particular file, see the