wiki:BinaryIO

Version 3 (modified by dons@…, 8 years ago) (diff)

Expand on binary IO

Binary I/O

Haskell 98 treats I/O as character-based, and lacks a well-defined mechanism for binary I/O. However, a number of competing external libraries exist providing various forms of binary I/O, providing forms of compressed I/O, and serialised, persistent data.

  • Character-based I/O is needed, at least because systems (e.g. Unix and Windows) have different line-termination conventions that should be hidden from programs. The problem becomes more acute when different environments use different character sets and encodings (see Unicode).
  • Binary I/O is needed both to handle binary data and as a base upon which general treatment s of character-encoding conversions (see Unicode) may be layered.

One proposal is to add a form of I/O over Word8 (i.e. octets, 8-bit binary values). See the "Binary input and output" section of System.IO for a rough design.

Another would be to look at one of the binary I/O libraries based on The Bits Between The Lambdas, descendents of which have proliferated in the last couple of years. The advantage of this style over the simpler System.IO library is support for serialising more complex data types, using type classes to recursively define binary I/O routines for each type component of the data you with to serialise. Instances of I/O may be written by hand, or derived mechanically with DrIFT.

Issues to consider:

  • What language extensions are required?
  • Support for cyclic structures
  • Is it possible to derive I/O instances for types, or must they be written by hand?

Existing libraries for Binary I/O:

  • The simplest is probably System.IO, which provides hGetBuf-style I/O. Really only suitable for arrays.
  • Packed strings, layered over System.IO is sometimes used, for simple data types, which can be easily converted to and from flat arrays, using list functions.
  • The de-facto standard, and also the fastest, for non-trivial data types, the Binary class, a version of which is described here. Distributed with nhc, and used by GHC to deal with .hi files. Tool support from DrIFT to derive new instances. Flavours include:
  • SerTH is a Binary-alike, which uses Template Haskell to derive serialiser instances for each data type. It's an alternative to using DrIFT (or handwriting) your own Binary instances. Obviously requires TH. Supports serialising cyclic structures
  • ByteStream, a new high-performance serialisation library, using gzip compression.

Further information:

The two simplest options are to go with only the System.IO extension, or the Binary class.

Pros:

  • The Binary class (particularly as implemented in NewBinary?) is simple, elegant and widely used.
  • Binary IO is an oft requested feature, lack of which is sometimes considered a flaw in Haskell98, so we should do something about it.

Cons:

  • Ideally(?) Binary should be derivable without an external tool
  • Binary only supports I/O from Handles and memory buffers. Some people require other kinds of streams
  • There is an overlap with Storable that isn't exploited or explained in any existing library.
  • Some new developments are underway to combine SerTH's cyclic structure support with the speed of NewBinary?
  • What about a NewIO library, how will this overlap/interact?