Opened 3 years ago

Closed 3 years ago

Last modified 2 years ago

#10400 closed bug (invalid)

Run time increases by 40% in fractal plotter core loop

Reported by: saffroy Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.8.4
Keywords: Cc:
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

I'm writing a toy fractal plotter to teach myself some Haskell. For a while I used GHC 7.4, recently upgraded to 7.10, and found a significant drop in run time performance. Tests with other versions show that the drop dates back to 7.8.

Not sure if this is a good practice, or if this is relevant here, but I used bangs to improve performance by adding strictness: this gives a good speedup with 7.4/7.6, but perf is unchanged with 7.8/7.10.

version run time (sec) - bangs no bangs
7.4.2 6.09 18.73
7.6.3 6.10 18.93
7.8.4 8.92 8.88
7.10.1 8.92 9.04

(All builds with ghc -O2.)

Source:

{-# LANGUAGE BangPatterns #-}

iterations :: Int -> Double -> Double -> Double -> Double -> Int -> Int
iterations !maxK !x !y !x0 !y0 !k =
  let !x2 = x * x
      !y2 = y * y
  in
   if k == maxK || x2 + y2 > 4
   then k
   else iterations maxK (x2 - y2 + x0) (2 * x * y + y0) x0 y0 (k + 1)

inSet :: Int -> Int -> (Double, Double) -> Bool
inSet !minK !maxK (!x, !y) = k >= minK && k < maxK
  where !k = iterations maxK x y x y 0

main = do
    let pointSource = repeat $ ((-0.6518209274278592), 0.3549264214581329)
        nPoints =  1000 * 1000
        points = take nPoints pointSource
        selected = filter (inSet 1000 (1000 * 20)) points
    print $ length selected == nPoints

The host C compiler is gcc 4.7.2.

Attachments (8)

ghc_test-7.6.3.prof (3.2 KB) - added by saffroy 3 years ago.
ghc_test-7.10.1.prof (3.2 KB) - added by saffroy 3 years ago.
stats-7.6.3 (851 bytes) - added by saffroy 3 years ago.
stats-7.10.1 (859 bytes) - added by saffroy 3 years ago.
core-7.6.3 (5.5 KB) - added by saffroy 3 years ago.
core-7.10.1 (5.8 KB) - added by saffroy 3 years ago.
replicate_core-7.6.3 (6.4 KB) - added by saffroy 3 years ago.
core with change in comment #2
replicate_core-7.10.1 (5.8 KB) - added by saffroy 3 years ago.
core with change in comment #2

Download all attachments as: .zip

Change History (14)

comment:1 Changed 3 years ago by nomeata

Thanks! Looks interesting.

Have you compared the RTS stats (pass +RTS -s to your binary)?

Have you compiled your program with profiling enabled, and created a profile (+RTS -p), and compared that?

Finally, you could compare the produce Core (compile with of -ddump-simpl) for obvious differences.

Is there a noticable difference whether you use take n (repeat ..) or replicate n ...? If so, maybe some list fusion is happening or not happening. That should be observable in the RTS stats (number of allocations), or in the Core.

Changed 3 years ago by saffroy

Attachment: ghc_test-7.6.3.prof added

Changed 3 years ago by saffroy

Attachment: ghc_test-7.10.1.prof added

Changed 3 years ago by saffroy

Attachment: stats-7.6.3 added

Changed 3 years ago by saffroy

Attachment: stats-7.10.1 added

Changed 3 years ago by saffroy

Attachment: core-7.6.3 added

Changed 3 years ago by saffroy

Attachment: core-7.10.1 added

comment:2 in reply to:  1 Changed 3 years ago by saffroy

Replying to nomeata:

Thanks! Looks interesting.

Thank you for looking!

Below I am comparing 7.6 (last fast version for my use case) to 7.10 (last stable). I have attached various outputs (stats, profiles, cores) for your viewing pleasure.

Have you compared the RTS stats (pass +RTS -s to your binary)?

Memory usage is considerably smaller with 7.10, though GC times are negligible in both cases.

Have you compiled your program with profiling enabled, and created a profile (+RTS -p), and compared that?

Profiles look the same.

Finally, you could compare the produce Core (compile with of -ddump-simpl) for obvious differences.

There are differences in the core loop (iterations), but I can't really tell what changed... Can you please have a look? I have attached them.

Is there a noticable difference whether you use take n (repeat ..) or replicate n ...? If so, maybe some list fusion is happening or not happening. That should be observable in the RTS stats (number of allocations), or in the Core.

So I did this:

--- a/ghc_test.hs
+++ b/ghc_test.hs
@@ -16,6 +16,6 @@ inSet !minK !maxK (!x, !y) = k >= minK &
 main = do
     let pointSource = repeat $ ((-0.6518209274278592), 0.3549264214581329)
         nPoints =  1000 * 1000
-        points = take nPoints pointSource
+        points = replicate nPoints $ head pointSource
         selected = filter (inSet 1000 (1000 * 20)) points
     print $ length selected == nPoints

With this change, I see no difference in stats or run time. I do see small differences in the core, but for the core loop they seem small.

Changed 3 years ago by saffroy

Attachment: replicate_core-7.6.3 added

core with change in comment #2

Changed 3 years ago by saffroy

Attachment: replicate_core-7.10.1 added

core with change in comment #2

comment:3 Changed 3 years ago by rwbarton

Almost all the time is spent inside iterations, which is a 20-instruction long loop in both the 7.6 and 7.8 versions. The two loops are very similar, but some detail causes the 7.8 loop to run about 50% slower, probably some pipeline stall issue. It looks hard to even track down the exact issue, let alone deal with it in the code generator in a general way.

I note that the LLVM backend produces a smaller loop which is a bit faster than even the 7.6 loop. The LLVM backend is usually good at optimizing this kind of non-allocating code.

comment:4 Changed 3 years ago by saffroy

Ah! Indeed, with -fllvm the run times are much more consistent across versions of GHC.

version run time (sec) - bangs no bangs
7.4.2 4.99 19.67
7.6.3 4.99 19.30
7.8.4 4.80 4.80
7.10.1 4.97 4.97

In this case, do we keep this bug open?

comment:5 Changed 3 years ago by nomeata

Resolution: invalid
Status: newclosed

In this case, do we keep this bug open?

Assuming rwbarton’s analysis is correct, I think we can close this bug with a *shrug*.

comment:6 Changed 2 years ago by rwbarton

That said, we do very much appreciate bug reports like this one, even if in the end we are unable to do much about them!

Note: See TracTickets for help on using tickets.