Opened 11 months ago

Closed 4 months ago

#9132 closed bug (fixed)

takeWhile&C. still not fusible

Reported by: Blaisorblade Owned by: skeuchel
Priority: normal Milestone: 7.10.1
Component: Core Libraries Version: 7.8.3
Keywords: fusion Cc: hvr, ekmett, mail@…, David.Feuer@…, core-libraries-committee@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: #9537 Differential Revisions: Phab:D322

Description

takeWhile is not still fusible, 2 1/2 years after this report:

http://www.haskell.org/pipermail/glasgow-haskell-users/2011-December/021299.html

The discussion suggests making takeWhile a good producer/consumer with foldr/build fusion:

takeWhile' :: (a -> Bool) -> [a] -> [a]
takeWhile' p xs = build $ \c n -> foldr (takeWhileF p c n) n xs
{-# INLINE takeWhile' #-}

takeWhileF p c n x xs = if p x then x c xs else n

Furthermore, the discussion suggests having rewrite rules to go to this version and then rewrite back (if fusion fails).

The report also mentions concatMap (which is a separate known problem). It also mentions drop and dropWhile, but I don't see when they perform allocations, so I think that's an erroneous request.

I experienced the bug on GHC 7.6.3, but it seems still there in base-4.7.0.0, judging from the source:
http://hackage.haskell.org/package/base-4.7.0.0/docs/src/GHC-List.html#takeWhile

Attachments (3)

takewhilefusion.patch (5.9 KB) - added by skeuchel 11 months ago.
withoutfusion.txt.gz (35.6 KB) - added by skeuchel 10 months ago.
Gzippd output of -dverbose-core2core -ddump-inlinings -dsuppress-uniques without fusion
withfusion.txt.gz (37.5 KB) - added by skeuchel 10 months ago.
Gzippd output of -dverbose-core2core -ddump-inlinings -dsuppress-uniques with fusion

Download all attachments as: .zip

Change History (30)

comment:1 Changed 11 months ago by nomeata

  • Cc mail@… added

comment:2 Changed 11 months ago by nomeata

I think the request got stuck after “Please submit a patch” – “Will do”. So, whose up for writing a patch? (I’ll put the ticket on the suitable-tickets-for-ZuriHac2014 list).

comment:3 Changed 11 months ago by simonpj

Joachim is right. Things get fixed when someone decides to fix them!

In this case the fix is not hard, but needs a clean nofib comparision run to make sure that there are no performance regressions; and preferably a performance test we can add to testsuite/perf to check that the fusion really does happen.

Thanks

Simon

comment:4 Changed 11 months ago by Blaisorblade

Thanks for your answer! And sorry if I complained.

I hope having a ticket already helps, but I've also given a try at "reproducing the (++) scheme" (following also map, where the scheme is clearer). For now I hacked it in a separate file, and I could verify that fusion still happens in my example. (And that tweaking even small things has nontrivial effects).

Does the below make enough sense to go on? If so, the next main step (for me, or anybody who beats me) is just learning how to rebuild GHC and run nofib.

module IntToString where

import Prelude hiding (takeWhile)
import GHC.Exts

--takeWhile' :: (a -> Bool) -> [a] -> [a]
--takeWhile' p xs = build $ \c n -> foldr (takeWhileFB p c n) n xs
--{-# INLINE takeWhile' #-}

takeWhileFB p c n x xs = if p x then x `c` xs else n
{-# INLINE [0] takeWhileFB #-}

{-# NOINLINE [1] takeWhile #-} -- We want the RULE to fire first.
takeWhile               :: (a -> Bool) -> [a] -> [a]
takeWhile _ []          =  []
takeWhile p (x:xs)
            | p x       =  x : takeWhile p xs
            | otherwise =  []

{-
-- STUPID
"takeWhile/backBad"    [1] forall p xs. takeWhile' p xs = takeWhile p xs
-}

-- Why can't I use, on the RHS, a function I mark with INLINE such as takeWhile' above? If I do that, the final program contains takeWhile. Probably just a phase ordering problem.
{-# RULES
"takeWhile/fuse"    [~1] forall p xs. takeWhile p xs = build $ \c n -> foldr (takeWhileFB p c n) n xs
"takeWhile/back"   [1] forall p xs. foldr (takeWhileFB p (:) []) [] xs = takeWhile p xs
  #-}

-- The example I happened to work on when noticing this.
-- With the above definitions, fusion appears to work.
toChar digit = toEnum $ digit + fromEnum '0'

intToString i =
  if i < 0 then
     '-' : digits
   else
     digits
  where
    digits =
      reverse . map (toChar . (`mod` 10)) . takeWhile (/=0) . iterate (`div` 10) . abs $ i
Last edited 11 months ago by Blaisorblade (previous) (diff)

comment:5 Changed 11 months ago by nomeata

Looks ok, so if you want go ahead with building GHC and running nofib.

BTW, wat do you mean by

But this is a foldr, while takeWhile should be a foldl!

comment:6 Changed 11 months ago by Blaisorblade

BTW, wat do you mean by

But this is a foldr, while takeWhile should be a foldl!

Please ignore that, I've just removed that comment from my example. (If you're curious, when I wrote that comment I hadn't realized that right folds in a lazy language process the list left-to-right rather than right-to-left, and more such confusion; then I figured it out but forgot to remove the comment).

Looks ok, so if you want go ahead with building GHC and running nofib.

On it!

comment:7 follow-up: Changed 11 months ago by skeuchel

  • Owner set to skeuchel

comment:8 in reply to: ↑ 7 Changed 11 months ago by Blaisorblade

Replying to skeuchel:
my progress is in this branch: https://github.com/Blaisorblade/ghc/tree/topic/fuseTakeWhile (see https://github.com/Blaisorblade/ghc/pull/1 for a better web interface), and include some doc changes. Since you assigned the bug to yourself (so I guess you plan to work on it), you might want to resume from there.

Current status: the changes are done and do work on my example (causing fusion), now one should validate the performance impacts.
Right now, I didn't write a benchmark to measure the speedup on my program (or any other), and there's no net effect on nofib (even though takeWhile is sometimes called — but probably not in inner loops). The automated tests are also missing.

(Sorry for not getting back to you before, I'm traveling lots this month — next week I'm at PLDI).

comment:9 Changed 11 months ago by nomeata

@skeuchel:

Good work so far¹. I think you can also do this for dropWhile. It’s true that dropWhile does not perform allocation on its own, but if you have sum $ dropWhile (<100) [0..1000], there is value in fusing this: dropWhile needs to be a good consumer to avoid the allocations in the producer. And once that has been fused, the result is allocating, so dropWhile also should be a good producer.

¹ [private communication]

comment:10 Changed 11 months ago by Blaisorblade

@nomeata: while @skeuchel is the owner, I'm the OP. Thanks for explaining me why fusing dropWhile is important.

Was your comment addressed at the branch I posted, or is there some other work from skeuchel? (I saw the [private communication] and didn't get it either).

In case it is relevant, it seems to me harder to fuse dropWhile.

Writing dropWhile as a fold is harder (technically, because it is not a catamorphism but a paramorphism). When the predicate fails, you want to return the rest of the list without processing it, but the original list is not available:

dropWhileF p c n x xs = if p x then xs else <REST LIST, including x and xs before processing>

I see that can be accomplished using, in essence, the reader monad: make the fold produce a function taking the list as extra argument. However, I understand that's often not a good idea, or even a non-starter, for performance. So, what's the way forward?

comment:11 Changed 11 months ago by nomeata

Yes, @skeuchel is here at ZuriHac and is working on this. He took your changes and did validating and nofib runs and other cleanup stuff. It just seems that he is just a little shy about talking about it here :-)

You might be right about dropWhile. In that case, a [Note] in the code, explaining why we do not fuse dropWhile, would be helpful

comment:12 Changed 11 months ago by skeuchel

Hi @Blaisorblade @nomeata,

I took the changes from your branch and added a test case for this. However, it causes a 3.9% regression in terms of allocation of the rewrite test in nofib. This occurrence of takeWhile
https://github.com/ghc/nofib/blob/master/spectral/rewrite/Main.lhs#L193
is rewritten to takeWhileFB and then later written back without being fused. However, it seems to change the inline cost of string_of which will result that it will be inlined later and ultimately this results in losing sharing in one case. See this diff
http://lpaste.net/105275
of the output of the inliner/rewriter for some more details. You can clearly see the duplicated let bindings in the core output.

comment:13 Changed 11 months ago by nomeata

Great analysis.

I think the change can go in nevertheless. Optimization is hardly ever a win in all cases, and I think the regression is just a random knock-on-effect that could well have been an improvement. Also, it is not takeWhile specific, but rather a problem with the list-fusion setup in general.

I’d say we can merge this.

Changed 11 months ago by skeuchel

comment:14 Changed 11 months ago by skeuchel

  • Status changed from new to patch

comment:15 Changed 11 months ago by nomeata

The patch you attached looks good to me. @Blaisorblade, what do you think?

comment:16 Changed 11 months ago by Blaisorblade

Thanks for taking care of this!

The performance changes make sense to me, especially to fft2 which does use takeWhile. (I saw them on one OS X desktop, which however seemed to like suspending and had more stable timing; I didn't see them on a Linux machine which was otherwise more stable).

On the regression, I agree it seems accidental. I'm confused that s1 and s2 are not joined by CSE. Delite/LMS, for instance, uses hash consing to prevent such programs from being formed in memory (the second definition is in a nested scope, but it seems this should not be a problem here).

But I guess CSE in GHC would break programs using IO after the definition of IO is inlined. (No, I was confused, sorry. There is a transformation which is valid in lambda-calculus but not in GHC, but that's work-duplicating inlining, which would duplicate side-effects after IO is inlined — http://www.haskell.org/haskellwiki/IO_inside#Welcome_to_the_RealWorld.2C_baby).

Last edited 10 months ago by Blaisorblade (previous) (diff)

comment:17 Changed 10 months ago by simonpj

Good work.

  • Could someone post the nofib results?
  • Like Blaisorblade I don't understand why CSE doesn't catch the duplicate redex. Can someone post the result of -dverbose-core2core -ddump-inlinings?

Microsoft's network won't me see the hpaste/lpaste site at the moment (investigating) so I can't see anything you post there at the moment.

Simon

comment:18 Changed 10 months ago by skeuchel

@spj: I'm currently traveling back from ZuriHac. Will do so when I'm back home tonight.

Changed 10 months ago by skeuchel

Gzippd output of -dverbose-core2core -ddump-inlinings -dsuppress-uniques without fusion

Changed 10 months ago by skeuchel

Gzippd output of -dverbose-core2core -ddump-inlinings -dsuppress-uniques with fusion

comment:19 Changed 8 months ago by dfeuer

  • Cc David.Feuer@… added
  • Milestone set to 7.8.4
  • Version changed from 7.8.2 to 7.8.3

comment:20 Changed 6 months ago by thoughtpolice

  • Component changed from libraries/base to Core Libraries

Moving over to new owning component 'Core Libraries'.

comment:21 Changed 6 months ago by dfeuer

  • Cc core-libraries-committee@… added
  • Differential Revisions set to D322
  • Milestone changed from 7.8.4 to 7.10.1

comment:22 Changed 6 months ago by dfeuer

  • Differential Revisions changed from D322 to Phab:D322

comment:23 Changed 6 months ago by Joachim Breitner <mail@…>

In d14d3f92d55a352db7faf62939127060716c4694/ghc:

Make Data.List.takeWhile fuse: fix #9132

Summary:
Rewrites takeWhile to a build/foldr form; fuses repeated
applications of takeWhile.

Reviewers: nomeata, austin

Reviewed By: nomeata

Subscribers: thomie, carter, ezyang, simonmar

Projects: #ghc

Differential Revision: https://phabricator.haskell.org/D322

GHC Trac Issues: #9132

comment:24 Changed 6 months ago by dfeuer

  • Status changed from patch to merge

concatMap will be addressed under #9537, so I'll change the status here to "merge".

comment:25 follow-up: Changed 6 months ago by rwbarton

This isn't really a bug fix, so I wouldn't merge it. I'll leave it in 'merge' status though so Austin will see it in case he disagrees.

comment:26 in reply to: ↑ 25 Changed 6 months ago by dfeuer

Replying to rwbarton:

This isn't really a bug fix, so I wouldn't merge it. I'll leave it in 'merge' status though so Austin will see it in case he disagrees.

I just don't know how tickets are handled when they are about done, and I figured changing to "merge" was a safer bet than changing to "fixed".

comment:27 Changed 4 months ago by thoughtpolice

  • Resolution set to fixed
  • Status changed from merge to closed
Note: See TracTickets for help on using tickets.