Changes between Version 1 and Version 2 of DataParallel


Ignore:
Timestamp:
Nov 17, 2006 5:51:45 PM (9 years ago)
Author:
simonpj
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DataParallel

    v1 v2  
    33This page describes the data-parallel Haskell implementation, including notes about where we are and what needs doing.
    44
    5 == Optimisation problems ==
    6 
    7 This is a summary of the current optimisation problems we know about
    8 
    9 
    10 1. General remarks
    11 
    12 This is mostly about !SpecConstr which is the central optimisation for
    13 stream-based fusion. The stream stuff is based on seeds:
    14 {{{
    15 data Step s a = Done | Skip s | Yield a s
    16 data Stream a = forall s. Stream (s -> Step s a) s
    17 }}}
    18 Ultimately, every stream is consumed by some kind of fold, e.g.,
    19 {{{
    20 foldlS :: (a -> b -> a) -> a -> Stream b -> a
    21 foldlS f z (Stream step s) = go z s
    22   where
    23     go z s = case step s of
    24                Done       -> z
    25                Skip    s' -> go z s'
    26                Yield x s' -> go (f z x) s'
    27 }}}
    28 In every iteration we obtain a new seed and pass it to the next
    29 iteration. In general, the seed is a combination of tuples, Eithers and
    30 some primitive values (Ints, mostly). If we want fusion to be efficient,
    31 we have to get rid of the constructors and just keep those parts of the
    32 seed actually required. This is what !SpecConstr does but, alas, not
    33 always.
    34 
    35 
    36 2. No !SpecConstr for Bools
    37 
    38 Consider the following two functions:
    39 {{{
    40 foo :: Bool -> Int -> Int
    41 foo True  0 = 0
    42 foo True  n = foo True (n-1)
    43 foo False n = foo True n
    44 
    45 bar :: Either Int Int -> Int
    46 bar (Left  0) = 0
    47 bar (Left  n) = bar (Left (n-1))
    48 bar (Right n) = bar (Left n)
    49 }}}
    50 !SpecConstr rewrites bar to
    51 {{{
    52 bar   x = I# ($wbar x)
    53 
    54 $wbar (Left (I# 0)) = 0
    55 $wbar (Left (I# n)) = $s$wbar (n -# 1)
    56 $wbar (Right n)     = $s$wbar1 n
    57 
    58 $s$wbar1 (I# 0)     = 0
    59 $s$wbar1 (I# n)     = $s$wbar (n -# 1)
    60 
    61 $s$wbar (I# 0)      = 0
    62 $s$wbar (I# n)      = $s$wbar (n -# 1)
    63 }}}
    64 This is great (although I do wonder why $s$wbar1 hasn't been inlined).
    65 However, for foo we get
    66 {{{
    67 foo b (I# n) = I# ($wfoo b n)
    68 $wfoo True  0 = 0
    69 $wfoo True  n = $wfoo True (n -# 1)
    70 $wfoo False n = $wfoo True n
    71 }}}
    72 Here, we have a conditional in every iteration which is definitely not
    73 what we want! In particular, it prevents Bools (and, I guess,
    74 enumerations in general) from ever being used in seeds.
    75 
    76 
    77 3. No !SpecConstr for mutually recursive functions
    78 
    79 This one comes up a lot in surprising circumstances. Consider
    80 {{{
    81 foo :: Maybe Int -> Int
    82 foo Nothing  = 0
    83 foo (Just 0) = foo Nothing
    84 foo (Just n) = foo (Just (n-1))
    85 }}}
    86 This does not look mutually recursive, but of course it is immediately
    87 transformed into
    88 {{{
    89 lvl = foo Nothing
    90 foo Nothing  = 0
    91 foo (Just 0) = lvl
    92 foo (Just n) = foo (Just (n-1))
    93 }}}
    94 which isn't getting optimised at all.
    95 
    96 
    97 4. No fixed point analysis
    98 
    99 I think we talked about this briefly at some point; in any case, this
    100 turns out to be one of the biggest problems. Consider
    101 {{{
    102 foo :: Maybe Int -> Int -> Int
    103 foo   (Just m) 0 = 0
    104 foo x@(Just m) n = foo x (n-m)
    105 }}}
    106 !SpecConstr doesn't get rid of the Just because it assumes that x must be
    107 boxed which is obviously not the case here. This actually happens with
    108 seeds all the time because we often pass a part of the seed unchanged to
    109 the next iteration. We would still like them to be unboxed as the boxed
    110 version is never needed.
    111 
    112 The following are some random ramblings about how this could perhaps be
    113 fixed quickly. I have tried several things to help the compiler here.
    114 The obvious:
    115 {{{
    116 foo (Just m) n = foo (Just m) (n-m)
    117 }}}
    118 doesn't work because CSE spots this and turns it into the previous
    119 version. The next thing I tried was a hack:
    120 {{{
    121 foo (Just m) n = let {-# INLINE x #-}
    122                      x = Just m
    123                  in foo x (n-m)
    124 }}}
    125 This is rewritten to
    126 {{{
    127 foo (Just m) n = foo (__inline_me (Just m)) (n-m)
    128 }}}
    129 and CSE doesn't look inside the `__inline_me`. Unfortunately, neither does
    130 !SpecConstr. After teaching !SpecConstr (and the rule matcher) to ignore
    131 Core notes (I believe this is what is wanted in general, although I'm
    132 not really sure about SCCs) this actually worked (although m still
    133 wasn't getting unboxed, more about this below).
    134 
    135 Unfortunately, things are not so simple. First, inline notes tend to
    136 disappear a lot so as a quick hack, I made CSE ignore everything under a
    137 "nocse" core note. More problematic is the fact that we usually do not
    138 know the actual type of a (part of a) seed; stream transformers augment
    139 seeds but cannot look inside of them. So just to see if this is
    140 feasible, I tried to write a generic rebox function, as in
    141 {{{
    142 class Rebox a where
    143   rebox :: a -> a
    144 instance Rebox Int where
    145   {-# INLINE rebox #-}
    146   rebox (I# i#) = I# ({-# CORE "nocse" #-} i#)
    147 instance (Rebox a, Rebox b) => Rebox (a,b) where
    148   {-# INLINE rebox #-}
    149   rebox (x,y) = (rebox x, rebox y)
    150 ...
    151 }}}
    152 Then, we could have
    153 {{{
    154 data Stream a = forall s. Rebox s => Stream ...
    155 }}}
    156 and use rebox on all parts of the seed which are passed unchanged to the
    157 next iteration or, even better, on the entire seed in every consumer.
    158 Note that all calls to rebox are inlined because the optimiser knows all
    159 the types involved. We also want to stop the reboxing at certain points
    160 in the tree. For instance, in
    161 {{{
    162 scanlS :: (a -> b -> a) -> a -> Stream b -> Stream a
    163 }}}
    164 the a is part of the seed but we do *not* want to rebox it in each
    165 iteration. The class-based code above allows us to express this.
    166 
    167 Anyway, this did work to some extent, but not really well. Still,
    168 perhaps playing around with a rebox pragma might be a feasible idea if
    169 it was better integrated into the compiler. In particular, the
    170 strictness analyser could infer U instread of S for things under the
    171 pragma and !SpecConstr could look at unfoldings. I'm not at all sure
    172 about this, though. I might try to implement it and play around with it
    173 at some point (but not soon!).
    174 
    175 A related problem is
    176 {{{
    177 foo :: Int -> Int -> Int
    178 foo 0 n = 0
    179 foo m n = foo (m-n) n
    180 }}}
    181 Here, n isn't getting unboxed although it could be if m is not 0 in the
    182 first iteration. Perhaps unrolling the loop once could help here.
    183 
    184 
    185 5. Join points
    186 
    187 Unfortunately, I have lost the example, but when trying out the reboxing
    188 stuff I saw the following happen quite a lot. I can try to find an
    189 example if you want.
    190 
    191 Suppose we have (*after* worker/wrapper, i.e., the join point must be
    192 introduced by the subsequent simplifier run) something like
    193 {{{
    194 foo p = let join (x,y) = ... foo (x,y) ...
    195         in
    196         case ... of
    197           ... -> join (x1,y1)
    198           ... -> join (x2,y2)
    199 }}}
    200 !SpecConstr would turn this into
    201 {{{
    202 foo' x y = let join (x,y) = ... foo x y ...
    203            in
    204            case ... of
    205              ... -> join (x1,y1)
    206              ... -> join (x2,y2)
    207 }}}
    208 This doesn't buy us a lot - we still construct and immediately
    209 deconstruct a pair in each iteration. This is a fairly obscure case,
    210 though.
    211 
    212 
    213 6. Inlining vs. rewriting
    214 
    215 One thing I've noticed when trying out stream fusion for lists is the
    216 following. If we define
    217 {{{
    218 map f = unstream . mapS f . stream
    219 }}}
    220 we *always* want to inline map (especially if we are going to rewrite it
    221 back if fusion doesn't happen). But if stream is lazy (which it is), in
    222 {{{
    223 foo f g = map f . map g
    224 }}}
    225 the two map won't be inlined, I assume because the simplifier sees no
    226 reason to do it. I remember that you made the simplifier keener to
    227 inline in contexts in which rules might apply but this doesn't work here
    228 because map doesn't have a rewrite rule - it's the stream/unstream rule
    229 we want to fire.
    230 
    231 So just saying {-# INLINE map #-} doesn't help us. So what do we do? One
    232 solution is to have an additional rewrite rule:
    233 {{{
    234   map f = unstream . mapS f . stream
    235 }}}
    236 But this results in a lot of code duplication, especially for functions
    237 which aren't quite as small (i.e., stream transformers such as mapS
    238 which we always want to inline). Neither does just having the rule and
    239 defining map as
    240 {{{
    241 map = undefined
    242 }}}
    243 help since GHC just treats map as a diverging function.
    244 
    245 This problem is actually even worse with stream transformers which we
    246 only want to inline quite late in the game as information about their
    247 strictness, arity and so on should still be available in the earlier
    248 phases. This means that we want GHC to see their definitions and then
    249 inline them unconditionally at some point.
    250 
    251 It would be quite handy to have a pragma which means "inline
    252 unconditionally", for instance REWRITE. It would support the same phase
    253 annotations etc. but behave as if instead of
    254 {{{
    255 {-# REWRITE [n] foo #-}
    256 foo x = e
    257 
    258 we wrote
    259 
    260 {-#  NOINLINE [n] foo #-}
    261 foo x = e
    262 
    263 {-# RULES
    264 foo = \x -> e
    265 #-}
    266 }}}
    267 It doesn't have to be implemented via rules, of course. I briefly talked
    268 to Don about this and he said that he'd like to have this as well for
    269 the ByteString library.
    270 
    271 One question here is: do we want the above rule or
    272 {{{
    273 foo = __inline_me(\x -> e)
    274 }}}
    275 
    276 
    277 In general, I have to say that the quality of the loops produced by
    278 fusion depends on the optimiser doing a *really* good job which is quite
    279 fragile. I've been thinking about how to help the compiler even more but
    280 don't have anything worth talking about yet.
    281 
     5 * [wiki:DataParallel/Optimisation Optimisation, and problems therewith]
     6 * [wiki:DataParallel/Related Other nested data parallel work]