Changes between Version 3 and Version 4 of BinaryIO


Ignore:
Timestamp:
Dec 20, 2005 12:27:00 AM (9 years ago)
Author:
dons@…
Comment:

Polish the BinaryIO story

Legend:

Unmodified
Added
Removed
Modified
  • BinaryIO

    v3 v4  
    11= Binary I/O = 
     2[[PageOutline]] 
    23 
    3 Haskell 98 treats I/O as character-based, and lacks a well-defined mechanism for binary I/O. However, a number of competing external libraries exist providing various forms of binary I/O, providing forms of compressed I/O, and serialised, persistent data. 
     4Haskell 98 treats I/O as character-based, and lacks a well-defined mechanism for binary I/O. However, a number of external libraries exist providing various forms of binary I/O. 
    45 
     6Two forms of binary I/O are considered here: 
     7 * Word8 based extensions to Syste.IO, and 
     8 * Typeclass-based Binary I/O (referred to as Binary) for serialising arbitrary data types, layered over Word8 extensions 
     9 
     10== Explanation == 
    511 * Character-based I/O is needed, at least because systems (e.g. Unix and Windows) have different line-termination conventions that should be hidden from programs. The problem becomes more acute when different environments use different character sets and encodings (see [wiki:Unicode]). 
    612 * Binary I/O is needed both to handle binary data and as a base upon which general treatment s of character-encoding conversions (see [wiki:Unicode]) may be layered. 
     13 * Type-classed binary I/O is needed to support serialisable structures and peristence for arbitrary Haskell data 
    714 
    8 One proposal is to add a form of I/O over `Word8` (i.e. octets, 8-bit binary values). See the "Binary input and output" section of [http://www.haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html System.IO] for a rough design. 
     15== Proposal 1 - System.IO == 
     16 * One proposal is to add a form of I/O over `Word8` (i.e. octets, 8-bit binary values). See the "Binary input and output" section of [http://www.haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html System.IO] for a rough design. 
    917 
    10 Another would be to look at one of the binary I/O libraries based on [ftp://ftp.cs.york.ac.uk/pub/malcolm/ismm98.html The Bits Between The Lambdas], descendents of which have proliferated in the last couple of years. The advantage of this style over the simpler System.IO library is support for serialising more complex data types, using type classes to recursively define binary I/O routines for each type component of the data you with to serialise. Instances of I/O may be written by hand, or derived mechanically with [http://repetae.net/john/computer/haskell/DrIFT/ DrIFT]. 
     18== Proposal 2 - The Binary class == 
     19 * Proposal two is to add a Binary class, based on the type class described in [ftp://ftp.cs.york.ac.uk/pub/malcolm/ismm98.html The Bits Between The Lambdas]. The advantage of this form of binary I/O over the simpler System.IO library is support for serialising more complex data types, using type classes to recursively define binary I/O routines for each component of the type. Instances of I/O may be written by hand, or derived mechanically with [http://repetae.net/john/computer/haskell/DrIFT/ DrIFT]. Ideally Binary would be derivable by the compiler (is this feasible?). 
    1120 
    12 Issues to consider: 
    13  * What language extensions are required? 
    14  * Support for cyclic structures 
    15  * Is it possible to derive I/O instances for types, or must they be written by hand? 
     21== References == 
    1622 
    17 Existing libraries for Binary I/O: 
    18  * The simplest is probably [http://www.haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html System.IO], which provides hGetBuf-style I/O. Really only suitable for arrays. 
    19  * [http://www.cse.unsw.edu.au/~dons/fps.html Packed strings], layered over System.IO is sometimes used, for simple data types, which can be easily converted to and from flat arrays, using list functions. 
    20  * The de-facto standard, and also the fastest, for non-trivial data types, the Binary class, a version of which is [ftp://ftp.cs.york.ac.uk/pub/malcolm/ismm98.html described here]. Distributed with nhc, and used by GHC to deal with .hi files. Tool support from DrIFT to derive new instances. Flavours include: 
     23=== Proposal 1 === 
     24 * The simplest implementation option is [http://www.haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html System.IO], which provides hGetBuf-style I/O. More sophisticated systems can be layered on top, as external libraries. 
     25 * [http://www.cse.unsw.edu.au/~dons/fps.html Packed strings], layered over System.IO, are a related interface, and sometimes used for binary I/O of flat data types. 
     26 
     27=== Proposal 2 === 
     28 * The Binary class is the de-facto standard for more structured data. The origins are [ftp://ftp.cs.york.ac.uk/pub/malcolm/ismm98.html described here]. Distributed with nhc, and used by GHC to deal with .hi files. Tool support from DrIFT to derive new instances. Flavours include: 
    2129    * [http://haskell.org/nhc98/libs/Binary.html NHC's binary], the original 
    2230    * [http://cvs.haskell.org/cgi-bin/cvsweb.cgi/~checkout~/fptools/ghc/compiler/utils/Binary.hs GHC's Binary], used internally by GHC. 
    23     * [http://www.n-heptane.com/nhlab/repos/NewBinary/ NewBinary], the standard 
    24     * [http://www.cse.unsw.edu.au/~dons/code/hmp3/Binary.hs Lambdabot/Hmp3's Binary], a faster, Handle-only version of Binary. 
     31    * [http://www.n-heptane.com/nhlab/repos/NewBinary/ NewBinary], the standard version today 
     32    * [http://www.cse.unsw.edu.au/~dons/code/hmp3/Binary.hs Lambdabot/Hmp3's Binary], a stripped-down Handle-only version of Binary. 
    2533 * [http://www.cs.helsinki.fi/u/ekarttun/SerTH/ SerTH] is a Binary-alike, which uses Template Haskell to derive serialiser instances for each data type. It's an alternative to using DrIFT (or handwriting) your own Binary instances. Obviously requires TH. Supports serialising cyclic structures 
    2634 * [http://freearc.narod.ru/ ByteStream], a new high-performance serialisation library, using gzip compression. 
    2735 
    28 Further information: 
    29  * [http://www.haskell.org/pipermail/haskell/2005-December/017029.html A recent mailing list thread]. 
    30  * [http://haskell.org/hawiki/BinaryIo A page on the Haskell wiki] 
     36== Pros/Cons : System.IO == 
    3137 
    32 The two simplest options are to go with only the System.IO extension, or the Binary class. 
     38=== Pros === 
     39 * System.IO extensions are already in common use, simple to implement 
     40 * More sophisticated binary I/O may be layered on top 
    3341 
    34 Pros: 
    35  * The Binary class (particularly as implemented in NewBinary) is simple, elegant and widely used. 
    36  * Binary IO is an oft requested feature, lack of which is sometimes considered a flaw in Haskell98, so we should do something about it. 
     42=== Cons === 
     43 * Possible that the API is not rich enough for many binary I/O requirements, we should strive for more? 
    3744 
    38 Cons: 
    39  * Ideally(?) Binary should be derivable without an external tool 
    40  * Binary only supports I/O from Handles and memory buffers. Some people require other kinds of streams 
    41  * There is an overlap with Storable that isn't exploited or explained in any existing library. 
    42  * Some new developments are underway to combine SerTH's cyclic structure support with the speed of NewBinary 
    43  * What about a NewIO library, how will this overlap/interact? 
     45== Pros/Cons : Binary == 
     46 
     47=== Pros === 
     48 * The Binary class (particularly as implemented in NewBinary) is simple to implement and widely used. 
     49 * Binary IO is an oft requested feature, lack of which is sometimes considered a flaw in Haskell98. 
     50 * Difficult to serialise data without this class 
     51 
     52=== Cons === 
     53 * There is an overlap with the Storable class that isn't exploited 
     54 * Doesn't support cyclic structures 
     55 * Lack of derivability can be annoying 
     56