Opened 6 years ago

Closed 5 years ago

#5363 closed bug (fixed)

optimized profiled version of a program incorectly compiles $!

Reported by: phercek Owned by: simonmar
Priority: normal Milestone: 7.4.1
Component: Profiling Version: 7.0.3
Keywords: optimization profiling stricness Cc:
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Incorrect result at runtime Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:


Here is a test program (file name prgSrc.hs):

import Data.Array.Unboxed

main = do
  let l1 = [1..10] :: [Int]
  let l2 = [ map (i+) l1 | i <- [1..5000000] ]
  let l3 = map (\l -> listArray (1,length l) l) l2 :: [UArray Int Int]
  print $ accumulate l3 0

accumulate [] rv = rv
accumulate (h:t) rv =
  let nextRv = (rv + sum (elems h)) in
  accumulate t $! nextRv

I used ghc 7.0.3-2 on archlinux, 64 bit version. I created it only to check how much memory short unboxed arrays consume. Thanks to the "$!" call at the last line of the "accumulate" function there should not be any stack overflow.

When I compile with these options:
ghc --make prgSrc.hs
ghc -O2 --make prgSrc.hs
ghc -prof -auto-all -caf-all --make prgSrc.hs
then there is no problem.

But when I compile with these options:
ghc -O2 -prof -auto-all -caf-all --make prgSrc.hs
then the program runs out of stack.

This indicates that there is a bug while compiling "$!" in an optimized profiling version of this program.

Change History (7)

comment:1 Changed 6 years ago by

FWIW, I can reproduce the behaviour with 7.0.4 and 6.12.3. With -auto instead of -auto-all there's no stack overflow. I haven't looked deeply into the generated core, but what jumps out is that with -auto-all, the worker for accumulate does unnecessary boxing and re-unboxing:

Rec {
Main.$waccumulate [Occ=LoopBreaker]
  :: [Data.Array.Base.UArray GHC.Types.Int GHC.Types.Int]
     -> GHC.Prim.Int#
     -> GHC.Prim.Int#
[GblId, Arity=2, Caf=NoCafRefs, Str=DmdType SL]
Main.$waccumulate =
  \ (w_s1c4 :: [Data.Array.Base.UArray GHC.Types.Int GHC.Types.Int])
    (ww_s1c7 :: GHC.Prim.Int#) ->
    let {
      rv_s1cT :: GHC.Types.Int
      rv_s1cT = GHC.Types.I# ww_s1c7 } in
    case __scc {accumulate main:Main !}
         case w_s1c4 of _ {
           [] -> rv_s1cT;
           : h_acc t_acd ->
             case h_acc
             of _ { Data.Array.Base.UArray l_aPh u_aPi ds1_aPj ds2_aPk ->
             case ds1_aPj of _ { GHC.Types.I# x_aIT ->
             let {
               y_aPA [Dmd=Just L] :: GHC.Prim.Int#
               [LclId, Str=DmdType]
               y_aPA = GHC.Prim.-# x_aIT 1 } in
             case GHC.Prim.># 0 y_aPA of _ {
               GHC.Bool.False ->
                 letrec {
                   go_s1cX [Occ=LoopBreaker] :: GHC.Prim.Int# -> [GHC.Types.Int]
                   [LclId, Arity=1, Str=DmdType L]
                   go_s1cX =
                     \ (x1_aPE :: GHC.Prim.Int#) ->
                         @ GHC.Types.Int
                         (GHC.Types.I# (GHC.Prim.indexIntArray# ds2_aPk x1_aPE))
                         (case GHC.Prim.==# x1_aPE y_aPA of _ {
                            GHC.Bool.False -> go_s1cX (GHC.Prim.+# x1_aPE 1);
                            GHC.Bool.True -> GHC.Types.[] @ GHC.Types.Int
                          }); } in
                 case $wsum'_r1eg (go_s1cX 0) 0 of ww1_s1c0 { __DEFAULT ->
                 case Main.$waccumulate t_acd (GHC.Prim.+# ww_s1c7 ww1_s1c0)
                 of ww2_s1cb { __DEFAULT ->
                 GHC.Types.I# ww2_s1cb
               GHC.Bool.True ->
                 case $wsum'_r1eg (GHC.Types.[] @ GHC.Types.Int) 0
                 of ww1_s1c0 { __DEFAULT ->
                 case Main.$waccumulate t_acd (GHC.Prim.+# ww_s1c7 ww1_s1c0)
                 of ww2_s1cb { __DEFAULT ->
                 GHC.Types.I# ww2_s1cb
    of _ { GHC.Types.I# ww2_s1cb ->
end Rec }

comment:2 Changed 6 years ago by simonpj

Profiling disables all sorts of optimisations. Think of it as compiling without -O. So I'm absolutely unsurprised by unnecessary boxing and unboxing.

However I can see that changing the asymptotic stack behaviour is undesirable, esp since you are using an explicit strict-apply. How much does this matter to you?

comment:3 Changed 6 years ago by phercek

Not much. It confused me quite a bit so I reported it. This is the first time ghc given me wrong result. I'll just take care not to add -O when profiling. Well, that is if there is not a similar bug for some other programs which would change asymptotic behavior even when optimizations are not used. I did not see such a bug yet. That would defeat the goal of profiling. I profile to solve space/time problems. I often solve these by changing strictness explicitly. If profiling ignored explicit strictness specifications then it would be kind of useless.

So I think fixing this can be postponed till a similar bug is not found which would not depend on -O flag too.

comment:4 Changed 6 years ago by simonmar

Component: CompilerProfiling
Milestone: 7.4.1

It looks like the simplifier has left behind behind the unboxing and reboxing created by the worker-wrapper transformation, and as a result accumulate is no longer tail-recursive, which explains the stack overflow. Presumably the problem is that the scc annotation got in the way.

I think we should defer this for now (along with lots of other profiling tickets) until we have time to do a thorough overhaul of profiling. The first step would be to figure out the desired semantics of Core with scc, so that we know how to do optimisation in its presence.

comment:5 Changed 5 years ago by simonmar

Owner: set to simonmar

comment:6 Changed 5 years ago by marlowsd@…

commit eea40328004e3cad1fdd31004337e10e6ae5fc52

Author: Simon Marlow <>
Date:   Wed Dec 7 15:23:28 2011 +0000

    Improve optimisation in the presence of SCCs (fixes #5363)
    We had some special cases to handle things like
      case (scc c (case E of alts)) of alts'
    but it only worked when there was a single scc in the way.  This
    generalises the optimisation to handle multiple sccs and ticks, so
    that we can catch most case-of-case optimisations that would normally
    apply in the absence of profiling.
    This fixes the example in #5363, and nofib results (with -prof
    -fprof-auto) show that allocation universally goes down or stays the

 compiler/simplCore/Simplify.lhs |   35 +++++++++++++++++++++++++----------
 1 files changed, 25 insertions(+), 10 deletions(-)

comment:7 Changed 5 years ago by simonmar

difficulty: Unknown
Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.