Opened 4 years ago
Last modified 5 weeks ago
#7206 new bug
Implement cheap build
Reported by: | simonpj | Owned by: | simonpj |
---|---|---|---|
Priority: | normal | Milestone: | ⊥ |
Component: | Compiler | Version: | 7.4.2 |
Keywords: | Cc: | mail@…, carter.schonwald@…, snoyberg, edsko, ekmett, maoe | |
Operating System: | Unknown/Multiple | Architecture: | Unknown/Multiple |
Type of failure: | Runtime performance bug | Test Case: | |
Blocked By: | Blocking: | ||
Related Tickets: | Differential Rev(s): | ||
Wiki Page: |
Description
We sometimes see stuff like this:
f n ps = let ys = [1..x] in map (\zs. ys ++ zs) ps
You might think the (++) would fuse with the [1..x], via foldr/build fusion, but it doesn't. Why not? Because it would be WRONG to do so in this case:
f ns ps = let ys = map expensive ns in map (\zs. ys ++ zs) ps
If we fused the (++) with the map we might call expensive once for each element of ps.
This is fairly easy to fix. The point is that [1..x] is cheap; we'd prefer to fuse it even if doing so involves computing 1, 1+1, 2+1, etc multiple times. Suppose we express this fact thusly:
enumFromTo lo hi = cheapBuild (\cn. ....lo...hi...) map f xs = build (\cn. ...f...xs...)
Now we want the foldr/cheapBuild rule to fire even if that would involve duplicating the call to cheapBuild. And we already have a way to do that: we make cheapBuild into a CONLIKE function.
Happily it's almost all simply a change to the libraries, not the compiler itself.
I just need to execute on this, but I keep failing to get round to it. Below is the beginning. One missing piece is that I need to replace the hack for build in the occurrence analyser, so that it works for cheapBuild too. (At least until we have Ilya's cardinality analyser.)
Simon
diff --git a/GHC/Base.lhs b/GHC/Base.lhs index 6a36eb5..b78edf5 100644 --- a/GHC/Base.lhs +++ b/GHC/Base.lhs @@ -304,6 +304,12 @@ build :: forall a. (forall b. (a -> b -> b) -> b -> b) -> [a] build g = g (:) [] +cheapBuild :: forall a. (forall b. (a -> b -> b) -> b -> b) -> [a] +{-# INLINE CONLIKE [1] cheapBuild #-} +-- cheapBuild is just like build, except that it is CONLIKE +-- See Note [cheapBuild] +cheapBuild g = g (:) [] + -- | A list producer that can be fused with 'foldr'. -- This function is merely -- @@ -320,6 +326,8 @@ augment g xs = g (:) xs {-# RULES "fold/build" forall k z (g::forall b. (a->b->b) -> b -> b) . foldr k z (build g) = g k z +"fold/cheapBuild" forall k z (g::forall b. (a->b->b) -> b -> b) . + foldr k z (cheapBuild g) = g k z "foldr/augment" forall k z xs (g::forall b. (a->b->b) -> b -> b) . foldr k z (augment g xs) = g k (foldr k z xs) @@ -343,6 +351,12 @@ augment g xs = g (:) xs "augment/build" forall (g::forall b. (a->b->b) -> b -> b) (h::forall b. (a->b->b) -> b -> b) . augment g (build h) = build (\c n -> g c (h c n)) + +"augment/cheapBuild" forall (g::forall b. (a->b->b) -> b -> b) + (h::forall b. (a->b->b) -> b -> b) . + augment g (cheapBuild h) = build (\c n -> g c (h c n)) + -- 'augment' doesn't necessarily have a cheap argument, so we revert to 'build' + "augment/nil" forall (g::forall b. (a->b->b) -> b -> b) . augment g [] = build g #-} @@ -351,6 +365,20 @@ augment g xs = g (:) xs -- augment g (augment h t) = augment (\cn -> g c (h c n)) t \end{code} +Note [cheapBuild] +~~~~~~~~~~~~~~~~~ +cheapBuild is just like build, except that it is CONLIKE + +It is used in situations where fusion is more imortant than sharing, +ie in situation where its argument function 'g' in (cheapBuild g) is +cheap. + +Main example: enumerations of one kind or another: + f x = let xs = [x..] + go = \y. ....go y'....(map (h y) xs)... + in ... +Here we woud like to fuse the map with the [x..] + ---------------------------------------------- -- map @@ -831,7 +859,7 @@ a `iShiftRL#` b | b >=# WORD_SIZE_IN_BITS# = 0# -- Rules for C strings (the functions themselves are now in GHC.CString) {-# RULES -"unpack" [~1] forall a . unpackCString# a = build (unpackFoldrCString# a) +"unpack" [~1] forall a . unpackCString# a = cheapBuild (unpackFoldrCString# a) "unpack-list" [1] forall a . unpackFoldrCString# a (:) [] = unpackCString# a "unpack-append" forall a n . unpackFoldrCString# a (:) n = unpackAppendCString# a n diff --git a/GHC/Enum.lhs b/GHC/Enum.lhs index cea3ced..561a995 100644 --- a/GHC/Enum.lhs +++ b/GHC/Enum.lhs @@ -376,9 +376,9 @@ instance Enum Char where enumFromThenTo (C# x1) (C# x2) (C# y) = efdtChar (ord# x1) (ord# x2) (ord# y) {-# RULES -"eftChar" [~1] forall x y. eftChar x y = build (\c n -> eftCharFB c n x y) -"efdChar" [~1] forall x1 x2. efdChar x1 x2 = build (\ c n -> efdCharFB c n x1 x2) -"efdtChar" [~1] forall x1 x2 l. efdtChar x1 x2 l = build (\ c n -> efdtCharFB c n x1 x2 l) +"eftChar" [~1] forall x y. eftChar x y = cheapBuild (\c n -> eftCharFB c n x y) +"efdChar" [~1] forall x1 x2. efdChar x1 x2 = cheapBuild (\ c n -> efdCharFB c n x1 x2) +"efdtChar" [~1] forall x1 x2 l. efdtChar x1 x2 l = cheapBuild (\ c n -> efdtCharFB c n x1 x2 l) "eftCharList" [1] eftCharFB (:) [] = eftChar "efdCharList" [1] efdCharFB (:) [] = efdChar "efdtCharList" [1] efdtCharFB (:) [] = efdtChar @@ -510,7 +510,7 @@ instance Enum Int where -- In particular, we have rules for deforestation {-# RULES -"eftInt" [~1] forall x y. eftInt x y = build (\ c n -> eftIntFB c n x y) +"eftInt" [~1] forall x y. eftInt x y = cheapBuild (\ c n -> eftIntFB c n x y) "eftIntList" [1] eftIntFB (:) [] = eftInt #-} @@ -539,7 +539,7 @@ eftIntFB c n x0 y | x0 ># y = n {-# RULES "efdtInt" [~1] forall x1 x2 y. - efdtInt x1 x2 y = build (\ c n -> efdtIntFB c n x1 x2 y) + efdtInt x1 x2 y = cheapBuild (\ c n -> efdtIntFB c n x1 x2 y) "efdtIntUpList" [1] efdtIntFB (:) [] = efdtInt #-} @@ -646,8 +646,8 @@ instance Enum Integer where enumFromThenTo x y lim = enumDeltaToInteger x (y-x) lim {-# RULES -"enumDeltaInteger" [~1] forall x y. enumDeltaInteger x y = build (\c _ -> enumDeltaIntegerFB c x y) -"efdtInteger" [~1] forall x y l.enumDeltaToInteger x y l = build (\c n -> enumDeltaToIntegerFB c n x y l) +"enumDeltaInteger" [~1] forall x y. enumDeltaInteger x y = cheapBuild (\c _ -> enumDeltaIntegerFB c x y) +"efdtInteger" [~1] forall x y l.enumDeltaToInteger x y l = cheapBuild (\c n -> enumDeltaToIntegerFB c n x y l) "enumDeltaInteger" [1] enumDeltaIntegerFB (:) = enumDeltaInteger "enumDeltaToInteger" [1] enumDeltaToIntegerFB (:) [] = enumDeltaToInteger #-}
Attachments (2)
Change History (19)
comment:1 Changed 4 years ago by simonpj
comment:2 Changed 4 years ago by nomeata
- Cc mail@… added
comment:3 Changed 4 years ago by carter
- Cc carter.schonwald@… added
comment:4 Changed 4 years ago by igloo
- Milestone set to 7.8.1
- Owner set to simonpj
comment:5 Changed 4 years ago by simonpj
- Milestone changed from 7.8.1 to _|_
I tried this again, but got disappointing results:
-------------------------------------------------------------------------------- Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- anna -0.4% +0.0% -6.3% -5.9% +0.0% ansi -0.2% -0.6% 0.00 0.00 +0.0% atom -0.1% +0.0% +2.0% +2.0% +0.0% awards -0.4% +0.0% 0.00 0.00 +0.0% banner -0.1% +0.0% 0.00 0.00 +0.0% bernouilli -0.0% +0.0% -1.6% -1.6% +0.0% binary-trees -0.2% +0.0% +4.4% +4.4% +0.0% boyer -0.2% +0.0% 0.07 0.07 +0.0% boyer2 -0.5% +0.0% 0.01 0.01 +0.0% bspt -0.3% +0.0% 0.02 0.02 +0.0% cacheprof -1.4% +0.2% -0.7% -0.9% -0.5% calendar -0.2% +0.0% 0.00 0.00 +0.0% cichelli -0.4% -0.0% 0.13 0.13 +0.0% circsim -0.2% +0.0% -4.6% -4.7% +0.0% clausify -0.2% +0.0% 0.06 0.06 +0.0% comp_lab_zift -0.2% +0.0% -2.6% -2.6% +0.0% compress -0.3% +0.0% +3.0% +3.7% +0.0% compress2 -0.1% +0.0% +1.3% +1.3% +0.0% constraints -0.2% +0.0% -3.9% -3.9% +0.0% cryptarithm1 -0.2% +0.0% +6.3% +6.3% +0.0% cryptarithm2 -0.5% +0.0% 0.02 0.02 +0.0% cse -0.5% +0.0% 0.00 0.00 +0.0% eliza -0.1% +0.0% 0.00 0.00 +0.0% event -0.2% +0.0% +2.9% +2.9% +0.0% exp3_8 -0.2% +0.0% -2.8% -2.8% +0.0% expert -0.1% +0.0% 0.00 0.00 +0.0% fannkuch-redux -0.1% +0.0% -2.4% -2.4% +0.0% fasta -0.2% +0.0% -0.4% -0.4% +0.0% fem -0.0% +0.0% 0.04 0.04 +0.0% fft -0.1% +0.0% 0.06 0.06 +0.0% fft2 -0.1% +0.0% 0.10 0.10 +0.0% fibheaps -0.2% +0.0% 0.05 0.05 +0.0% fish -0.1% +2.7% 0.03 0.03 +0.0% fluid -0.1% +0.2% 0.01 0.01 +100.0% fulsom -0.3% +0.0% -4.0% -4.0% +3.2% gamteb -0.1% -0.0% 0.07 0.07 +0.0% gcd -0.1% +0.0% 0.05 0.05 +0.0% gen_regexps -0.1% +0.0% 0.00 0.00 +0.0% genfft -0.1% +0.0% 0.06 0.06 +0.0% gg -0.5% +0.1% 0.02 0.02 +0.0% grep -0.1% +0.0% 0.00 0.00 +0.0% hidden -0.1% +0.0% +4.2% +4.2% +0.0% hpg -0.4% -0.2% +8.8% +8.8% +0.0% ida -0.2% +0.0% 0.14 0.14 +0.0% infer -0.3% +0.0% 0.13 0.13 +0.0% integer -0.1% +0.0% +0.1% +0.1% +0.0% integrate -0.1% +0.0% +18.2% +18.2% +0.0% k-nucleotide -0.1% +0.0% -6.2% -6.2% +0.0% kahan -0.1% +0.0% +1.1% +1.1% +0.0% knights -0.2% +1.4% 0.01 0.01 +0.0% lcss -0.2% +0.0% -3.5% -3.5% +0.0% life -0.2% +0.0% +2.5% +2.5% +0.0% lift -0.3% +0.0% 0.00 0.00 +0.0% listcompr -0.2% +0.0% 0.15 0.15 +0.0% listcopy -0.2% +0.0% 0.16 0.16 +0.0% maillist -0.3% +0.0% 0.10 0.11 -3.9% mandel -0.0% -0.0% 0.12 0.12 +0.0% mandel2 -0.5% +0.0% 0.00 0.00 +0.0% minimax -0.3% +0.0% 0.00 0.00 +0.0% mkhprog -0.2% +0.0% 0.00 0.00 +0.0% multiplier -0.3% +0.0% +1.8% +1.8% +0.0% n-body -0.2% +0.0% -0.7% -0.6% +0.0% nucleic2 -0.1% +0.0% 0.11 0.11 +0.0% para -0.0% +0.0% +1.4% +0.8% +0.0% paraffins -0.2% +0.0% +13.6% +13.6% +0.0% parser -1.3% +3.5% 0.05 0.05 +0.0% parstof -0.3% +0.0% 0.01 0.01 +0.0% pic -0.1% +0.1% 0.00 0.00 +0.0% pidigits -0.1% +0.0% +1.5% +1.9% +0.0% power -0.1% +0.0% -4.9% -4.9% +0.0% pretty -0.1% -0.0% 0.00 0.00 +0.0% primes -0.1% +0.0% 0.11 0.11 +0.0% primetest -0.0% +0.0% 0.14 0.14 +0.0% prolog -0.2% +0.1% 0.00 0.17 +0.0% puzzle -0.3% +0.0% +1.7% +1.7% +0.0% queens -0.2% +0.0% 0.02 0.02 +0.0% reptile -0.2% -0.0% 0.02 0.02 +0.0% reverse-complem -0.3% +0.0% +3.8% +3.8% +0.0% rewrite -0.2% +0.0% 0.02 0.02 +0.0% rfib -0.1% +0.0% 0.03 0.03 +0.0% rsa -0.0% +0.0% 0.03 0.12 +0.0% scc -0.2% +0.0% 0.00 0.00 +0.0% sched -0.2% +0.0% 0.03 0.03 +0.0% scs -0.0% +0.1% -2.9% -3.1% +0.0% simple -0.9% +0.6% -10.8% -10.8% +13.8% solid -0.1% +0.0% +3.2% +3.2% +0.0% sorting -0.1% +0.0% 0.00 0.00 +0.0% spectral-norm -0.2% +0.0% +0.1% +0.1% +0.0% sphere -0.2% +0.0% 0.08 0.08 +0.0% symalg -0.2% +0.0% 0.01 0.04 +0.0% tak -0.2% +0.0% 0.02 0.02 +0.0% transform -0.2% +0.0% -0.6% -0.6% +0.0% treejoin -0.2% +0.0% +0.0% -0.7% +0.0% typecheck -0.2% -0.0% +0.0% +0.0% +0.0% veritas +0.7% +0.0% 0.00 0.00 +0.0% wang -0.1% +0.0% -5.5% -5.5% +0.0% wave4main -0.2% +0.0% -2.4% -2.4% +0.0% wheel-sieve1 -0.1% +0.0% -0.5% -0.5% +0.0% wheel-sieve2 -0.1% +0.0% -6.4% -6.4% +0.0% x2n1 -0.2% +0.0% 0.00 0.00 +0.0% -------------------------------------------------------------------------------- Min -1.4% -0.6% -10.8% -10.8% -3.9% Max +0.7% +3.5% +18.2% +18.2% +100.0% Geometric Mean -0.2% +0.1% +0.1% +0.1% +0.8%
On investigation it seemed that although it was good to fuse away the intermediate lists, we were duplicating the individual Ints or Chars that would otherwise have been shared, and this pushes up allocation. So we often lose a bit and seldom win much.
For example, in spectral/fish we get a lot more calls to GHC.CString.unpackAppendCString#, which allocates not only cons cells (which is fine) but also the characters themselves. Without the cheapBuild stuff we just call (++) on a shared list of characters.
So I'm parking this again. I attach the patches I used.
Simon
comment:6 Changed 3 years ago by simonpj
If we do this, don't forget to use cheapBuild for repeat.
Simon
comment:7 Changed 2 years ago by dfeuer
Even more kinds of producers makes for even more fusion rules. If we decide to rewrite all consumers to foldr, I suppose that then opens up the possibility of having multiple producers instead, but expanding on both ends makes for a rule explosion. I'm also trying, on and off, to see if we can get away with writing all these enumerations using unfoldr or similar; I'm not sure how these ideas interact.
An idea that might be crazy: introduce a local annotation indicating that a certain function should be treated as CONLIKE at a particular call site. So then you'd have just {-# CHEAP #-} build blah and leave the fusion rules alone.
comment:8 Changed 2 years ago by simonpj
See also this thread, and this one.
comment:9 Changed 2 years ago by simonpj
Another thought. Regardless of what we do here, it seems likely that GHC will sometimes make the wrong choice. So we should give the programmer a way to make the "right" choice.
Example. Suppose [1..10000] is floated out and shared. Then we should provide duplicableEnumFromTo 1 10000 which means "please don't try to share me; instead fuse me with my consumers, even if that loses sharing". OK so you lose the nice notation, but you get to say what you want.
(And if we switch so that [1..1000] is by-default not-shared, then we should provide a way force it to be shared: nonDuplicableEnumFromTo 1 1000.
Simon
comment:10 Changed 2 years ago by snoyberg
- Cc snoyberg added
Following up on a separate discussion on the haskell-cafe, I have a possibly related case:
main :: IO () main = printLen >> printLen printLen :: IO () printLen = lengthM 0 [1..40000000 :: Int] >>= print lengthM :: Monad m => Int -> [a] -> m Int lengthM cnt [] = return cnt lengthM cnt (_:xs) = cnt' `seq` lengthM cnt' xs where cnt' = cnt + 1
On my system, this takes almost 1.2GB of memory. If I comment out the second call to printLen, it takes 44KB.
comment:11 Changed 2 years ago by edsko
- Cc edsko added
comment:12 Changed 20 months ago by ekmett
- Cc ekmett added
comment:13 Changed 15 months ago by bgamari
I brought up this issue again in yesterday's GHC meeting. Simon's opinion was that while this doesn't produce a compelling performance improvement with existing producers, it is a tool that may be nice to have available for use in user code.
I've ported these patches to master in my cheap-build branch. Sadly but not surprisingly the story told by nofib is still rather disappointing (showing only major changes),
-------------------------------------------------------------------------------- Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- cacheprof 0.0% +0.1% 0.173 0.173 -2.6% circsim 0.0% 0.0% +3.5% +3.6% 0.0% constraints 0.0% 0.0% +1.2% +1.2% 0.0% cryptarithm1 0.0% 0.0% -0.6% -0.4% 0.0% k-nucleotide 0.0% 0.0% -1.1% -1.1% 0.0% maillist 0.0% 0.0% 0.018 0.018 +5.6% wheel-sieve1 0.0% 0.0% +0.7% +0.9% 0.0% -------------------------------------------------------------------------------- Min 0.0% 0.0% -1.1% -1.1% -2.6% Max 0.0% +0.1% +3.5% +3.6% +5.6% Geometric Mean -0.0% +0.0% +0.2% +0.3% +0.0%
I'll be looking in to this later.
comment:14 Changed 15 months ago by thomie
- Type of failure changed from None/Unknown to Runtime performance bug
comment:15 Changed 15 months ago by simonpj
Is this with or without using cheapBuild for enumerations?
comment:16 Changed 15 months ago by bgamari
This is with cheapBuild being used for enumerations.
comment:17 Changed 5 weeks ago by maoe
- Cc maoe added
See also #7309