Changes between Version 1 and Version 2 of DataParallel


Ignore:
Timestamp:
Nov 17, 2006 5:51:45 PM (8 years ago)
Author:
simonpj
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DataParallel

    v1 v2  
    33This page describes the data-parallel Haskell implementation, including notes about where we are and what needs doing. 
    44 
    5 == Optimisation problems ==  
    6  
    7 This is a summary of the current optimisation problems we know about 
    8  
    9  
    10 1. General remarks 
    11  
    12 This is mostly about !SpecConstr which is the central optimisation for 
    13 stream-based fusion. The stream stuff is based on seeds: 
    14 {{{ 
    15 data Step s a = Done | Skip s | Yield a s 
    16 data Stream a = forall s. Stream (s -> Step s a) s 
    17 }}} 
    18 Ultimately, every stream is consumed by some kind of fold, e.g., 
    19 {{{ 
    20 foldlS :: (a -> b -> a) -> a -> Stream b -> a 
    21 foldlS f z (Stream step s) = go z s 
    22   where 
    23     go z s = case step s of 
    24                Done       -> z 
    25                Skip    s' -> go z s' 
    26                Yield x s' -> go (f z x) s' 
    27 }}} 
    28 In every iteration we obtain a new seed and pass it to the next 
    29 iteration. In general, the seed is a combination of tuples, Eithers and 
    30 some primitive values (Ints, mostly). If we want fusion to be efficient, 
    31 we have to get rid of the constructors and just keep those parts of the 
    32 seed actually required. This is what !SpecConstr does but, alas, not 
    33 always. 
    34  
    35  
    36 2. No !SpecConstr for Bools 
    37  
    38 Consider the following two functions: 
    39 {{{ 
    40 foo :: Bool -> Int -> Int 
    41 foo True  0 = 0 
    42 foo True  n = foo True (n-1) 
    43 foo False n = foo True n 
    44  
    45 bar :: Either Int Int -> Int 
    46 bar (Left  0) = 0 
    47 bar (Left  n) = bar (Left (n-1)) 
    48 bar (Right n) = bar (Left n) 
    49 }}} 
    50 !SpecConstr rewrites bar to 
    51 {{{ 
    52 bar   x = I# ($wbar x) 
    53  
    54 $wbar (Left (I# 0)) = 0 
    55 $wbar (Left (I# n)) = $s$wbar (n -# 1) 
    56 $wbar (Right n)     = $s$wbar1 n 
    57  
    58 $s$wbar1 (I# 0)     = 0 
    59 $s$wbar1 (I# n)     = $s$wbar (n -# 1) 
    60  
    61 $s$wbar (I# 0)      = 0 
    62 $s$wbar (I# n)      = $s$wbar (n -# 1) 
    63 }}} 
    64 This is great (although I do wonder why $s$wbar1 hasn't been inlined). 
    65 However, for foo we get 
    66 {{{ 
    67 foo b (I# n) = I# ($wfoo b n) 
    68 $wfoo True  0 = 0 
    69 $wfoo True  n = $wfoo True (n -# 1) 
    70 $wfoo False n = $wfoo True n 
    71 }}} 
    72 Here, we have a conditional in every iteration which is definitely not 
    73 what we want! In particular, it prevents Bools (and, I guess, 
    74 enumerations in general) from ever being used in seeds. 
    75  
    76  
    77 3. No !SpecConstr for mutually recursive functions 
    78  
    79 This one comes up a lot in surprising circumstances. Consider 
    80 {{{ 
    81 foo :: Maybe Int -> Int 
    82 foo Nothing  = 0 
    83 foo (Just 0) = foo Nothing 
    84 foo (Just n) = foo (Just (n-1)) 
    85 }}} 
    86 This does not look mutually recursive, but of course it is immediately 
    87 transformed into 
    88 {{{ 
    89 lvl = foo Nothing 
    90 foo Nothing  = 0 
    91 foo (Just 0) = lvl 
    92 foo (Just n) = foo (Just (n-1)) 
    93 }}} 
    94 which isn't getting optimised at all. 
    95  
    96  
    97 4. No fixed point analysis 
    98  
    99 I think we talked about this briefly at some point; in any case, this 
    100 turns out to be one of the biggest problems. Consider 
    101 {{{ 
    102 foo :: Maybe Int -> Int -> Int 
    103 foo   (Just m) 0 = 0 
    104 foo x@(Just m) n = foo x (n-m) 
    105 }}} 
    106 !SpecConstr doesn't get rid of the Just because it assumes that x must be 
    107 boxed which is obviously not the case here. This actually happens with 
    108 seeds all the time because we often pass a part of the seed unchanged to 
    109 the next iteration. We would still like them to be unboxed as the boxed 
    110 version is never needed. 
    111  
    112 The following are some random ramblings about how this could perhaps be 
    113 fixed quickly. I have tried several things to help the compiler here. 
    114 The obvious: 
    115 {{{ 
    116 foo (Just m) n = foo (Just m) (n-m) 
    117 }}} 
    118 doesn't work because CSE spots this and turns it into the previous 
    119 version. The next thing I tried was a hack: 
    120 {{{ 
    121 foo (Just m) n = let {-# INLINE x #-} 
    122                      x = Just m 
    123                  in foo x (n-m) 
    124 }}} 
    125 This is rewritten to 
    126 {{{ 
    127 foo (Just m) n = foo (__inline_me (Just m)) (n-m) 
    128 }}} 
    129 and CSE doesn't look inside the `__inline_me`. Unfortunately, neither does 
    130 !SpecConstr. After teaching !SpecConstr (and the rule matcher) to ignore 
    131 Core notes (I believe this is what is wanted in general, although I'm 
    132 not really sure about SCCs) this actually worked (although m still 
    133 wasn't getting unboxed, more about this below). 
    134  
    135 Unfortunately, things are not so simple. First, inline notes tend to 
    136 disappear a lot so as a quick hack, I made CSE ignore everything under a 
    137 "nocse" core note. More problematic is the fact that we usually do not 
    138 know the actual type of a (part of a) seed; stream transformers augment 
    139 seeds but cannot look inside of them. So just to see if this is 
    140 feasible, I tried to write a generic rebox function, as in 
    141 {{{ 
    142 class Rebox a where 
    143   rebox :: a -> a 
    144 instance Rebox Int where 
    145   {-# INLINE rebox #-} 
    146   rebox (I# i#) = I# ({-# CORE "nocse" #-} i#) 
    147 instance (Rebox a, Rebox b) => Rebox (a,b) where 
    148   {-# INLINE rebox #-} 
    149   rebox (x,y) = (rebox x, rebox y) 
    150 ... 
    151 }}} 
    152 Then, we could have 
    153 {{{ 
    154 data Stream a = forall s. Rebox s => Stream ... 
    155 }}} 
    156 and use rebox on all parts of the seed which are passed unchanged to the 
    157 next iteration or, even better, on the entire seed in every consumer. 
    158 Note that all calls to rebox are inlined because the optimiser knows all 
    159 the types involved. We also want to stop the reboxing at certain points 
    160 in the tree. For instance, in  
    161 {{{ 
    162 scanlS :: (a -> b -> a) -> a -> Stream b -> Stream a 
    163 }}} 
    164 the a is part of the seed but we do *not* want to rebox it in each 
    165 iteration. The class-based code above allows us to express this. 
    166  
    167 Anyway, this did work to some extent, but not really well. Still, 
    168 perhaps playing around with a rebox pragma might be a feasible idea if 
    169 it was better integrated into the compiler. In particular, the 
    170 strictness analyser could infer U instread of S for things under the 
    171 pragma and !SpecConstr could look at unfoldings. I'm not at all sure 
    172 about this, though. I might try to implement it and play around with it 
    173 at some point (but not soon!). 
    174  
    175 A related problem is 
    176 {{{ 
    177 foo :: Int -> Int -> Int 
    178 foo 0 n = 0 
    179 foo m n = foo (m-n) n 
    180 }}} 
    181 Here, n isn't getting unboxed although it could be if m is not 0 in the 
    182 first iteration. Perhaps unrolling the loop once could help here. 
    183  
    184  
    185 5. Join points 
    186  
    187 Unfortunately, I have lost the example, but when trying out the reboxing 
    188 stuff I saw the following happen quite a lot. I can try to find an 
    189 example if you want. 
    190  
    191 Suppose we have (*after* worker/wrapper, i.e., the join point must be 
    192 introduced by the subsequent simplifier run) something like 
    193 {{{ 
    194 foo p = let join (x,y) = ... foo (x,y) ... 
    195         in 
    196         case ... of 
    197           ... -> join (x1,y1) 
    198           ... -> join (x2,y2) 
    199 }}} 
    200 !SpecConstr would turn this into 
    201 {{{ 
    202 foo' x y = let join (x,y) = ... foo x y ... 
    203            in 
    204            case ... of 
    205              ... -> join (x1,y1) 
    206              ... -> join (x2,y2) 
    207 }}} 
    208 This doesn't buy us a lot - we still construct and immediately 
    209 deconstruct a pair in each iteration. This is a fairly obscure case, 
    210 though. 
    211  
    212  
    213 6. Inlining vs. rewriting 
    214  
    215 One thing I've noticed when trying out stream fusion for lists is the 
    216 following. If we define 
    217 {{{ 
    218 map f = unstream . mapS f . stream 
    219 }}} 
    220 we *always* want to inline map (especially if we are going to rewrite it 
    221 back if fusion doesn't happen). But if stream is lazy (which it is), in 
    222 {{{ 
    223 foo f g = map f . map g 
    224 }}} 
    225 the two map won't be inlined, I assume because the simplifier sees no 
    226 reason to do it. I remember that you made the simplifier keener to 
    227 inline in contexts in which rules might apply but this doesn't work here 
    228 because map doesn't have a rewrite rule - it's the stream/unstream rule 
    229 we want to fire. 
    230  
    231 So just saying {-# INLINE map #-} doesn't help us. So what do we do? One 
    232 solution is to have an additional rewrite rule: 
    233 {{{ 
    234   map f = unstream . mapS f . stream 
    235 }}} 
    236 But this results in a lot of code duplication, especially for functions 
    237 which aren't quite as small (i.e., stream transformers such as mapS 
    238 which we always want to inline). Neither does just having the rule and 
    239 defining map as 
    240 {{{ 
    241 map = undefined 
    242 }}} 
    243 help since GHC just treats map as a diverging function. 
    244  
    245 This problem is actually even worse with stream transformers which we 
    246 only want to inline quite late in the game as information about their 
    247 strictness, arity and so on should still be available in the earlier 
    248 phases. This means that we want GHC to see their definitions and then 
    249 inline them unconditionally at some point. 
    250  
    251 It would be quite handy to have a pragma which means "inline 
    252 unconditionally", for instance REWRITE. It would support the same phase 
    253 annotations etc. but behave as if instead of 
    254 {{{ 
    255 {-# REWRITE [n] foo #-} 
    256 foo x = e 
    257  
    258 we wrote 
    259  
    260 {-#  NOINLINE [n] foo #-} 
    261 foo x = e 
    262  
    263 {-# RULES 
    264 foo = \x -> e 
    265 #-} 
    266 }}} 
    267 It doesn't have to be implemented via rules, of course. I briefly talked 
    268 to Don about this and he said that he'd like to have this as well for 
    269 the ByteString library. 
    270  
    271 One question here is: do we want the above rule or 
    272 {{{ 
    273 foo = __inline_me(\x -> e) 
    274 }}} 
    275  
    276  
    277 In general, I have to say that the quality of the loops produced by 
    278 fusion depends on the optimiser doing a *really* good job which is quite 
    279 fragile. I've been thinking about how to help the compiler even more but 
    280 don't have anything worth talking about yet.  
    281  
     5 * [wiki:DataParallel/Optimisation Optimisation, and problems therewith] 
     6 * [wiki:DataParallel/Related Other nested data parallel work]