Vectorisation for nested data parallelism
TODO This material needs to be revised and we need to come up with a plan for getting some programs to quickly get some programs to compile. Also integrate the lifted case example.
We will try to implement full blown vectorisation using an explicit closure representation on Core code after lambda lifting. The transformation performs closure conversion and vectorisation in one sweep. We represent scalar and array closures as follows:
data a :-> b = forall env. PA env => Clo env (env -> a -> b) ([:env:] -> [:a:] -> [:b:]) data a :=> b = forall env. PA env => AClo [:env:] (env -> a -> b) ([:env:] -> [:a:] -> [:b:]) (#) :: (a :-> b) -> a -> b (#) (Clo env f _) = f env (##) :: (a :=> b) -> [:a:] -> [:b:] (##) (AClo envs _ fs) = fs envs
It is important that both kinds of closures include scalar and lifted code, as we need to move between a :-> b and a :=> b in both directions due to the functions:
replicateP :: Int -> (a :-> b) -> (a :=> b) replicateP n (Clo env f fs) = AClo (replicateP n env) f fs (!:) :: (a :=> b) -> Int -> (a :-> b) i !: AClo envs f fs = Clo (i !: envs) f fs
In other words, we move between the two types of closures simply by replicating and indexing into the environment.
We do not have any explicit type transformations. These are all encoded using associated types of the parallel array type class PA:
class PA e where data [:e:] type Vec e (!:) :: [:e:] -> Int -> e -- and so on instance PA () where data [:():] = PAUnit Int type Vect () = () PAUnit len !: i | i < len = () | otherwise = error "..." instance PA Int where data [:Int:] = PAInt Int [!Int!] type Vect Int = Int PAInt l as !: i = as `indexU` i instance (PA a, PA b) => PA (a, b) where data [:(a, b):] = PAProd [:a:] [:b:] type Vect (a, b) = (Vect a, Vect b) PAProd as bs !: i = (as !: i, bs :! i) instance (PA a, PA b) => PA (Either a b) where data [:Either a b:] = PASum [:Bool:] [:Int:] [:a:] [:b:] type Vect (Either a b) = Either (Vect a) (Vect b) PASum sels idx as bs !: i = if sels!:i then a!:(idx!:i) else b!:(idx!:i) instance PA a => PA [:a:] where data [:[:a:]:] = PAArr [:Int:] [:a:] type Vect [:a:] = [:[:a:]:] PAArr segd as !: i = sliceP from size as where segd' = scanlP (+) 0 segd from = segd'!:i size = segd !:i instance (PA a, PA b) => PA (a -> b) where data [: a -> b :] = PAFun [:Vect a :-> Vect b:] type Vect (a -> b) = Vect a :-> Vect b PAFun as !: i = as !: i instance (PA a, PA b) => PA (a :-> b) where data [: a :-> b :] = PAClo (a :=> b) type Vect (a :-> b) = a:-> b -- shouldn't happen, right? PAClo (AClo envs f fs) !: i = Clo (envs!:i) f fs
Mixing Vectorised and Scalar Code
We have two types of modules: (a) modules compiled as ever, which we call 'scalar modules', and (b) 'vectorised modules'. Scalar modules export the same code as before. Vectorised modules export additional identifiers.
- For every variable f :: t, we have in addition
f^p :: t^v f^p = V[[e]]the code for f is not the original scalar code. Instead, it is defined as
f :: t f = unvect f^p
- For every data type T, we have in addition T^v.
- For every function M.f :: a -> b imported from a scalar module M, we generate and use the following definition instead:
f :: (a -> b)^v f = vect M.f
The functions vect and unvect are defined in the same type classes where t^v is defined as an associated type. !!TODO: Try to define these two functions, to be sure we can actually do it.
Moreover, we like to have a toplevel declarations of the form derive PA (T) that create a suitable PA instance of a previously defined (and possibly imported) data type T.
Various Ideas to Avoid Full Blown Vectorisation
We discussed some approaches that would lead to a certain degree of vectorisation, but avoid dealing with issues, such as arrays of functions.
- We could have rewrite rules as follows (for a vectorised function):
mapP f -> f^ mapP f^ -> inject f^ mapP (inject f^) -> inject (mapP f^)where inject is the flatten/partition combination.
!!TODO: What else was there???