wiki:DistributedClosures

Distributed closures

This page desribes a possible design for distributed-closure, a library that implements serialisable closures, building on top of Typeable and StaticPointers. See the root page DistributedHaskell.

The distributed-process package implements a framework for distributed programming à la Erlang. Support for static closures is implemented in a separate package called distributed-static package. We propose to patch this library in the following way, and rename it to distributed-closure. Ultimately, distributed-closure should be the one-stop shop for all distributed frameworks that wish to allow users to program with static closures.

Proposed design

distributed-closure will define the following datatype:

data Closure a

Closure is the type of static closures. Morally, it contains some pointer to a static expression, paired with an environment of only serializable values.

Why do we need Closure? Closure is strictly more expressive than StaticPtr. StaticPtr can only be constructed from closed expressions (no free variables). Closure is built on top of StaticPtr. It allows encoding serializable expressions. That is, expressions formed of only top-level identifiers, literals, and serializable free variables. for example, using Closure, one can write:

f :: Int -> Int -> ...
f x y = ... closure (static (+)) `closureAp` closurePure x `closureAp` closurePure y ...

We introduce the following library functions on Closure. This is the full API of distributed-closure:

data Closure a

closure :: StaticPtr a -> Closure a
unclosure :: Closure a -> a

closurePure :: Serializable a => a -> Closure a
closureAp :: Closure (a -> b) -> Closure a -> Closure b

The signature of closure mentions Serializable, which is a class defined as follows:

data Dict c = c => Dict
class (Binary a, Typeable a) => Serializable a where
  serializableDict :: Closure (Dict (Serializable a))

NOTE: this definition of Serializable is different from the current one as found in distributed-process, and also different from the one DistributedHaskell#Serialization.

In words, a serializable value is a value for which we have a Binary instance and a Typeable instance, but moreover for which we can obtain a static proof of serializability, in form of a dictionary.

One could make do with an empty class definition for Serializable, as is currently the case, but without this augmented definition, lifting arbitrary serializable values to Closures becomes much less convenient (see closurePure below).

One might argue that Serializable is a tad more complex than it could be. The central issue is that the -XStaticPointers extension alone does not offer any means to reify a constraint to static evidence in the form of a dictionary - a feature that is in general very useful indeed (see below). In principle this could be done, since dictionaries are morally either static, or simple combinations of static dictionaries, but it would require a further extension to the compiler, possibly to the type system itself. The above definition of Serializable comes at the economy of such a compiler extension.

Below are some example instances:

instance Serializable Int where
  serializableDict = closure $ static Dict

instance Serializable (Maybe Int) where
  serializableDict = closure $ static Dict

instance Serializable b => Serializable (Either String b) where
  serializableDict =
    closure (static \Dict -> Dict) `closureApply` serializableDict

data Foo = Foo deriving (Generic, Typeable)

instance Binary Foo
instance Serializable Foo where
  serializableDict = static Dict

-- Datatype of state transitions, where the state s need not be serializable.
data Command s = Transition (Closure (s -> s))
               | Stop

instance Binary Command
instance (Typeable s) => Serializable (Command s) where
  -- NOTE: Requires the TTypeRep proposal.
  serializableDict =
    closure (static (`withTypeable` Dict)) `closureApply` closurePure tTypeRep

Notice the pay-as-you-go nature of the instances. Only in instances with polymorphic heads do you need to compose dictionaries by hand. Most users do not have type parameterized data types that they want to serialize. Those users never need to write more than simple one liner instances.

Implementation

Implementation in GHC

TODO See old proposal and blog post by Simon PJ.

Implementation of distributed-closure

The definition of Closure a is as follows:

data Closure a where
  StaticPtr :: StaticPtr b -> Closure b
  Encoded :: ByteString -> Closure ByteString
  Ap :: Closure (b -> c) -> Closure b -> Closure c

This definition permits an efficient implementation: there is no need to reserialize the environment everytime one composes two Closuress. The definition in the Cloud Haskell paper is as follows:

data Closure' a where
  Closure' :: StaticPtr (ByteString -> b) -> ByteString -> Closure b

Note that the Closure' constructor can be simulated:

Closure cf env <=> Ap (StaticPtr cf) (Encoded env)

One can even add the following constructor for better efficiency:

data Closure a where
  ...
  Closure :: Closure a -> a -> Closure a

Any StaticPtr can be lifted to a Closure, and so can any serializable value:

closure :: StaticPtr a -> Closure a
closure x = StaticPtr 

closurePure :: Serializable a => a -> Closure a
closurePure x =
    StaticPtr (static decodeD) `closureAp`
    serializableDict `closureAp`
    Encoded (encode x)
  where
    decodeD :: Dict (Serializable a) -> ByteString -> a
    decodeD Dict = decode

Side note: What if we didn't have serializableDict? What if we reverted to the current definition of Serializable? Then it would be impossible to write closurePure with the above signature. We would need to pass in explicitly some static serialization dictionary. This is quite a bother, since it forces us to introduce spurious top-level bindings for dictionaries, e.g.:

closurePure' :: Static (Dict (Serializable a)) -> a -> Closure a
closurePure' sdict x = StaticPtr sdict `closureAp` Encoded (encode x)

sdictInt :: Static (Dict (Serializable a))
sdictInt = static Dict

thirtyClosure = plusClosure `closureAp` closurePure sdictInt 10 `closureAp` closurePure sdictInt 20

End side note.

Given any two Closures with compatible types, they can be combined using closureAp:

closureAp :: Closure (a -> b) -> Closure a -> Closure b
closureAp = Ap

Notice how Closure is nearly the free applicative functor, though not completely free, because we impose Serializable a in closurePure. It is, however, a semigroupoid.

Closure serialization is straightforward, but closure deserialization is tricky. See this blog post section from Simon PJ as to why. The issue is that when deserializing from a bytestring to target type Closure b, one needs to ensure that the target type matches the type of the closure before it was serialized, lest bad things happen. We need to impose that Typeable b when deserializing to Closure b, but that doesn't help us for all closures. Consider in particular the type of Ap:

Ap :: Closure (b -> c) -> Closure b -> Closure c

Notice that the type b is not mentioned in the return type of the constructor. We need to know Typeable (b -> c) and Typeable b in order to recursively deserialize the subclosures, but we can't infer either from the context Typeable c.

There are solutions to this problem.

Alternative 1:

The trick is to realize that in fact the type b does not matter: it could be arbitrary. After all, that's why it appears existentially quantified in the type of the Ap constructor. Type safety guarantees that the Ap constructor always combines two Closures of compatible type. In other words, the type of the first argument of Ap could as well be taken to be Closure (Any -> c), because any lifted type can be coerced to/from Any, so that any value of type b can reasonably also be ascribed type Any. Since we don't care about the type b, might as well take it to be Any. If one trusts closure serialization, and indeed closure serialization must be part of the TCB (see DistributedHaskell#Thetrustedcodebase), then when deserializing a closure, we can reconstruct an Ap node, taking b ~ Any, or equivalently, always deserializing at the following type:

Ap :: Closure (Any -> c) -> Closure Any -> Closure c

Note: Any must have a Typeable instance. This is the case in GHC 7.8, but in GHC >= 7.9, Any is now a type family with no instance, hence cannot be given a Typeable instance (see tickets #9429). Deserialization can go something along the following lines (beware, highly idealized code):

decodeClosure :: forall a. Typeable a => ByteString -> (ByteString, Closure a)
decodeClosure (B.uncons -> '0':bs) = (StaticPtr <$> decodeStaticPtr) bs
decodeClosure (B.uncons -> '1':bs) = (Encoded <$> decodeByteString) bs
decodeClosure (B.uncons -> '2':bs) = (do
    cf <- decodeClosure :: Any -> a
    cx <- decodeClosure :: Any
    return $ ClosureAp cf cx
    ) bs

Now it may often happen that decodeStaticPtr will be called against type StaticPtr Any, or StaticPtr (Any -> c). decodeStatic internally compares TypeReps before producing a result: it compares the TypeRep of the target type against the type found in the SPT. With the above definition of Ap, the TypeReps in general will not match exactly: we will be comparing the Typerep of Any -> c to that of b -> c for some type b. For this to work, we need to carry out TypeRep matching modulo Any. More precisely, we require the following laws (for some Typeable b):

isJust (cast :: Any -> Maybe b) == True
isJust (cast :: b -> Maybe Any) == True

Insofar as Any is an internal compiler type not normally accessible to the user, the above should not compromise type safety.

Alternative 2:

-- Like dynApply, but for things of type Closure a.
dynClosureApply :: Dynamic -> Dynamic -> Dynamic
dynClosureApply x y = error "TODO"

decodeClosure :: Typeable a => ByteString -> (ByteString, Closure a)
decodeClosure = fromDyn <$> dyn
  where
    dyn :: ByteString -> (ByteString, Dynamic)
    dyn ('0':bs) = (do
        DS (sptr :: StaticPtr a) <- decodeStatic bs
        return $ toDyn $ StaticPtr sptr) bs
    dyn ('1':bs) = (toDyn . Encoded <$> decodeByteString) bs
    dyn ('2':bs) = (dynClosureApply <$> dyn <*> dyn) bs

Assuming this proposal for Typeable, it should be possible to implement dynClosureApply in a type safe way, without using unsafeCoerce internally. Otherwise dynClosureApply will have to be part of the TCB.

End of alternatives.

All that remains is to implement unclosure:

unstatic :: StaticPtr a -> a

unclosure :: Closure a -> a
unclosure (StaticPtr sptr) = unstatic sptr
unclosure (Encoded x) = x
unclosure (Ap cf cx) = (unstatic cf) (unstatic cx)
unclosure (Closure cx x) = x

Tradeoffs:

Alternative 1:

  • requires a Typeable instance for Any.
  • requires type casting modulo Any.
  • Potentially faster: dynamic type checks done only once per StaticPtr in decodeClosure.
  • Makes decodeClosure part of the TCB.

Alternative 2:

  • Can be be implemented outside the TCB, but requires the Typeable proposal to do so.
  • Potentially slower: dynamic type checks at every Ap node when doing unDynClosure.

About performance

In the case of monomorphic functions, there need be no TypeRep sent over the wire in the scheme presented above, yet we still preserve type safety. TypeReps bloat the messages sent on the wire, so their absence can only be a good thing for performance. That said, we anticipate that even the dynamic type checks performed by decodeStatic can be detrimental for performance. For this reason, there ought to be a

unsafeDecodeStaticPtr :: ByteString -> (ByteString, StaticPtr a)
unsafeDecodeClosure :: ByteString -> (ByteString, Closure a)
unsafeDecodeClosure = ... unsafeDecodeStaticPtr ...

that perform no dynamic type checks. This of course may come at the cost of some type safety, but only if the user somehow writes something equivalent to decodeClosure (encodeClosure (x :: Closure a)) :: Closure b, or if the remote side crafts devious applications of two closures of incompatible type. In high performance settings, one can often make the simplifying assumption that peers can be trusted, so this should not be a problem in practice.

Conclusion

It appears possible to implement a language extension first proposed in the original Cloud Haskell paper in a way that supports polymorphic types - a feature that was not considered in the paper. Furthermore, the proposal in the original Cloud Haskell paper compromised type safety since it allowed deserializing Closures at arbitrary type, while this proposal adds extra safety yet still making it possible to use a backdoor for performance.

What's the trusted code base (TCB) that you need to trust in order to guarantee type safety? GHC of course, but not only. This language extension adds one new primitive function to GHC. But one needs to also trust decodeStaticPtr, and that's it. Note that if decodeStaticPtr is implemented in terms of Binary, then Ideally one would only have to trust GHC and its standard libraries, and have dynamic-closure be part of the standard library. But in any case dynamic-closure depends on at least binary in order to do its work, which it would be undesirable to pull into base, so is best kept separate from GHC.

Last modified 3 years ago Last modified on Oct 30, 2014 9:50:50 AM