wiki:ForeignData

Version 11 (modified by john@…, 8 years ago) (diff)

--

Allocating space in the bss and data segment from haskell

One is forced to use an external C file to allocate data in the bss or data segment even though no code at all will be output and the object file will simply contain a linker directive to allocate some space. This is a deficiency in the current FFI spec.

This is very easy to implement and is needed for low level haskell programming or when you wish to replace certain aspects of a haskell runtime with haskell code itself. It is currently the only thing one cannot do at all in haskell that one can do in C.

In addition to low level programming, any program that uses large amounts of static (or preallocated) data can benefit due to

  • decreased binary size
  • faster start up times
  • less memory consumption
  • constant data being shared among multiple running copies of the same program

Proposal (experimental in jhc)

allow declarations of the form

foreign space [const] <n> "name" :: Ptr <type>

where n is the number of elements to allocate (default 1) and type is a basic type or a renaming thereof.

the space allocated will be n*sizeof type for the sizeof as specified by the Storable class. user defined types (other than simple newtype or type renaming of built in types) may not be used.

if the type is 'forall a . Ptr a' then the size will be assumed to be one byte.

if 'const' is specified then that is an assertion the contents of memory there will never be modaified. It is a strong assertion in that the compiler is free to perform optimizations that rely on that fact, and place that memory in a segment that is unwritable and shared among processes.

initialized data

The initial contents of the memory may also be specified:

foreign space [bigendian|littleendian] [const] <n> "name" :: Ptr <type> = constant

where constant may be one of

  • a value: 3
  • an initialized list: [ 0, 1, 2, ...]
  • a "string" to be output as utf8, utf16 or ucs4 unicode code points depending on the size of the type of pointer it is assigned to.

if the data is initialized as a string, <n> will always refer to a number of characters regardless of encoding and the string will be null terminated (unless an explicit <n> chops off the trailing space)

big endian or little endian may be explicitly specified in which case the data is written out with the specified endianess, else it will be output in the default format of the target system.

implementation

Implementation is trivial once you can parse the new constructs (purposfully similar to existing haskell constructs so lexer and parser need not be modified other than to add new rule). these declarations translate immediatly into equivalant C, C--, or assembly linker directives.

caveats

It is anoying that <n> must be a constant and <type> must be a builtin, but there is not really any other recourse without defining a preprocessor in haskell and the restrictions are no more onerous than those placed on the arguments to foreign function calls. Something like template haskell would mitigate this problem when available.

sample translation

when compiling to C, here is an example of what the code will translate too.

foreign space const "myints" :: Ptr CInt = [1,2,3,4]

becomes

the haskell import:

forign import "&c_myints" myints :: Ptr CInt

and the c code:

int c_myints[] = {1,2,3,4};