Opened 9 years ago

Last modified 17 months ago

#2273 new bug

inlining defeats seq

Reported by: igloo Owned by:
Priority: lowest Milestone:
Component: Compiler Version: 6.9
Keywords: Cc: ezyang@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Consider this module:

module Q (tcExtendIdEnv2) where

-- Interesting code:

tcExtendIdEnv2 :: M a
tcExtendIdEnv2 = do env <- getEnv
                    let level :: Int
                        level = thLevel (tcl_th_ctxt env)
                    level `seq` tc_extend_local_id_env level

{-# NOINLINE tc_extend_local_id_env #-}
tc_extend_local_id_env :: Int -> M a
tc_extend_local_id_env th_lvl = if read "foo"
                                then th_lvl `seq` return undefined
                                else return undefined

thLevel :: ThStage -> Int
thLevel Comp       = 0
thLevel (Splice l) = l
thLevel (Brack l)  = l

-- Dull code:

type M a = IOEnv TcLclEnv a

data TcLclEnv = TcLclEnv { tcl_th_ctxt :: !ThStage }

data ThStage = Comp | Splice Int | Brack  Int

getEnv :: IOEnv env env
getEnv = IOEnv (\ env -> return env)

newtype IOEnv env a = IOEnv { unIOEnv :: env -> IO a }

instance Monad (IOEnv m) where
    IOEnv m >>= f = IOEnv (\ env -> do r <- m env
                                       unIOEnv (f r) env )
    return a = IOEnv (\ _ -> return a)
    fail = error

Compiling with

ghc -O -ddump-simpl -ddump-stg -c Q.hs

we get, in the STG,

Q.$wa =
    \r srt:(0,*bitmap*) [ww_sDx w_sDO]
        case
            case ww_sDx of wild_sEs {
              Q.Comp -> Q.lvl;
              Q.Splice l_sDA -> l_sDA;
              Q.Brack l_sDC -> l_sDC;
            }
        of
        tpl_sEt
        { GHC.Base.I# ipv_sEu ->
              let { sat_sDN = NO_CCS Q.TcLclEnv! [ww_sDx]; } in
              let {
                sat_sDL =
                    \u []
                        case ww_sDx of wild_sEv {
                          Q.Comp -> Q.lvl;
                          Q.Splice l_sDH -> l_sDH;
                          Q.Brack l_sDJ -> l_sDJ;
                        };
              } in  Q.tc_extend_local_id_env sat_sDL sat_sDN w_sDO;
        };

GHC seems to have inlined level, forced it (due to the seq), but passed along a second, inlined, unevaluated copy to tc_extend_local_id_env. So the whole environment is retained, defeating the purpose of the seq!

If I mark level as NOINLINE then the STG looks like this:

Q.a5 =
    \r srt:(0,*bitmap*) [env_sD1 eta_sDh]
        case env_sD1 of tpl_sDg {
          Q.TcLclEnv ipv_sD5 ->
              case
                  case ipv_sD5 of wild_sDN {
                    Q.Comp -> Q.lvl;
                    Q.Splice l_sD8 -> l_sD8;
                    Q.Brack l_sDa -> l_sDa;
                  }
              of
              level_sDc
              { __DEFAULT ->
                    case level_sDc of tpl1_sDf {
                      GHC.Base.I# ipv1_sDO -> Q.tc_extend_local_id_env tpl1_sDf tpl_sDg eta_sDh;
                    };
              };
        };

which fixes the env-retained problem, although I don't understand why two cases are done.

It would be nice not to have to resort to this level of trickery, though!

Attachments (1)

T2273.hs (1.0 KB) - added by morabbin 4 years ago.

Download all attachments as: .zip

Change History (31)

comment:1 Changed 9 years ago by simonpj

Thanks Ian -- nice report. What is happening is this. After inlining seq we get something like this:

  let level = case x of { A -> 1; B -> 2 }
  in case level of { _ -> ...level... }

Now GHC thinks that the case-expression for level is cheap (which it is), and therefore ok to duplicate, so it inlines it

  let level = case x of { A -> 1; B -> 2 }
  in case (case x of { A -> 1; B -> 2 }) of { _ -> ...level... }

Now there's only one remaining occurrence of level so that gets inlined too. Disaster.

This example has made me realise that what is really wrong here is that seq should really have a type more like

  fseq :: a -> (a -> b) -> b

Then we'd have

  level `fseq` (\level -> ...level...)

Now the inner level is nothing to do with the outer level and all will be well. So here we are saying what we mean: "evaluate level and use the evaluated version inside here".

This version of seq is just reversed strict function application, of course. Which is very like strict let. So with bang patterns we could also write

  let !level2 = level in ...level2...

This is satisfactorily explicit, but we must use a new name (level2). Perhaps that's not unreasonable.

Both fseq and a strict let could desugar to

   case level of level2 { _ -> ...level2... }

Meanwhile I think a good fix will be to change the desugarer to desugar saturated applications of seq to this same form. Not very robust to abstraction, but better than what we have now.

seq is subtler than it looks.

comment:2 Changed 9 years ago by simonpj

Owner: set to igloo
Type: bugmerge

Fixed by

Fri May 16 09:51:49 GMT Daylight Time 2008  simonpj@microsoft.com
  * Improve the treatment of 'seq' (Trac #2273)
  
  Trac #2273 showed a case in which 'seq' didn't cure the space leak
  it was supposed to.  This patch does two things to help
  
  a) It removes a now-redundant special case in Simplify, which
     switched off the case-binder-swap in the early stages.  This
     isn't necessary any more because FloatOut has improved since
     the Simplify code was written.  And switching off the binder-swap
     is harmful for seq.
  
  However fix (a) is a bit fragile, so I did (b) too:
  
  b) Desugar 'seq' specially.  See Note [Desugaring seq (2)] in DsUtils
     This isn't very robust either, since it's defeated by abstraction, 
     but that's not something GHC can fix; the programmer should use
     a let! instead.
  

    M ./compiler/basicTypes/MkId.lhs -6 +14
    M ./compiler/deSugar/DsUtils.lhs -9 +44
    M ./compiler/simplCore/Simplify.lhs -12 +15

Most of the new lines are comments!

I believe this could usefully be moved to the branch too.

I wonder if it'd be worth a test in eyeball?

Simon

comment:3 Changed 9 years ago by Isaac Dupree

are the following equivalent?

a `fseq` \a' -> b `fseq` \b' -> (a',b')
b `fseq` \b' -> a `fseq` \a' -> (a',b')

(I guess this is the usual question where the order of strictness doesn't matter because "imprecise" exceptions? but the order might have non-obvious performance implications? It's simpler than let because it's not mutually/recursive by default...)

comment:4 Changed 9 years ago by simonpj

They are semantically equivalent, but could perhaps have different space behaviour.

Simon

comment:5 Changed 9 years ago by igloo

Resolution: fixed
Status: newclosed

Merged

comment:6 Changed 9 years ago by simonmar

Architecture: UnknownUnknown/Multiple

comment:7 Changed 9 years ago by simonmar

Operating System: UnknownUnknown/Multiple

comment:8 Changed 8 years ago by simonpj

Resolution: fixed
Status: closedreopened

See also this thread, http://www.haskell.org/pipermail/cvs-ghc/2009-September/050164.html, which describes another instance of the very same thing. Furthermore, this instance is NOT FIXED by the above patch. So I'm re-opening.

Simon

comment:9 Changed 8 years ago by igloo

Owner: igloo deleted
Status: reopenednew

comment:10 Changed 8 years ago by simonpj

Note to self. I think the Right Solution is simply not to inline things that are "cheap", but only things that are "values". This is easy to achieve:

hunk ./compiler/coreSyn/CoreUnfold.lhs 637
- 	yes_or_no = active_inline && is_cheap && consider_safe
+ 	yes_or_no = active_inline && is_value && consider_safe

But it's not quite so easy: this change makes allocation go up in a couple of programs, and runtime goes up quite a bit. (We'd need to double-check that the runtime figures are right.)

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed
--------------------------------------------------------------------------------
         fulsom          -2.8%    +17.2%    +39.1%    +39.8%
         puzzle          +1.0%     +8.1%    +15.2%    +18.6%
           atom          +0.7%     +7.3%    +55.7%    +64.7%
        circsim          +0.5%     -0.0%    +35.2%    +37.1%
      compress2          +0.7%     -0.0%    +25.8%    +29.9%
           lcss          +0.7%     -0.0%    +42.8%    +48.1%
--------------------------------------------------------------------------------
            Min          -2.8%     -4.3%     -4.8%    -10.0%
            Max          +1.0%    +17.2%    +55.7%    +64.7%
 Geometric Mean          +0.5%     +0.5%    +18.2%    +21.4%

So, annoyingly, more investigation required. I can't do it today, so I'm recording the breadcrumbs here for when I get back to it.

Simon

comment:11 Changed 8 years ago by igloo

Type: mergebug

comment:12 Changed 8 years ago by igloo

Milestone: 6.12 branch

comment:13 Changed 7 years ago by igloo

Milestone: 6.12 branch6.12.3

comment:14 Changed 7 years ago by igloo

Milestone: 6.12.36.14.1
Priority: normallow

comment:15 Changed 7 years ago by igloo

Milestone: 7.0.17.0.2

comment:16 Changed 6 years ago by igloo

Milestone: 7.0.27.2.1

comment:17 Changed 6 years ago by ezyang

Cc: ezyang@… added
Type of failure: None/Unknown

comment:18 Changed 6 years ago by igloo

Milestone: 7.2.17.4.1

comment:19 Changed 5 years ago by igloo

Milestone: 7.4.17.6.1
Priority: lowlowest

comment:20 Changed 5 years ago by igloo

Milestone: 7.6.17.6.2

Changed 4 years ago by morabbin

Attachment: T2273.hs added

comment:21 Changed 4 years ago by morabbin

Seems to work as desired with 7.6.1:

ghc -O -ddump-simpl -ddump-stg -c T2273.hs

yields (in the STG code):

[GblId, Arity=2, Str=DmdType S(S)L, Unf=OtherCon []] =
    \r srt:(0,*bitmap*) [env_sxC s_sxP]
        case env_sxC of wild_sxO {
          Q.TcLclEnv ds_sxF ->
              case
                  case ds_sxF of _ {
                    Q.Comp -> Q.tcExtendIdEnv3;
                    Q.Splice l_sxI -> l_sxI;
                    Q.Brack l_sxK -> l_sxK;
                  }
              of
              level_sxN
              { GHC.Types.I# ipv_syf ->
                    Q.tc_extend_local_id_env level_sxN wild_sxO s_sxP;
              };
        };

comment:22 Changed 4 years ago by igloo

The ticket was reopened for a different example: http://hackage.haskell.org/trac/ghc/ticket/2273#comment:8

comment:23 Changed 3 years ago by thoughtpolice

Milestone: 7.6.27.10.1

Moving to 7.10.1.

comment:24 Changed 3 years ago by thomie

commit 970816ac0028f2f42ac4140d29e2f0dfe0e9af3e

Author: Simon Marlow <>
Date:   Wed Sep 16 14:04:54 2009 +0000

    Use let !y = x in .. x .. instead of seq in $! and evaluate (#2273)

Last edited 3 years ago by thomie (previous) (diff)

comment:25 Changed 3 years ago by thomie

Type of failure: None/UnknownRuntime performance bug

SPJ: what is the status of this ticket? You reopened it in comment:8 and comment:10, but this ticket is not on Status/SLPJ-Tickets.

The link in comment:8 is unfortunately broken.

The following tickets all link here: #2457, #3263, #5129, #9127, #9353.

comment:26 Changed 3 years ago by carter

I've seen some remarks in the ghc source tree about having the desugaring use seq#, but I didnt see any usage of it in ghc proper as far as I could tell.

comment:27 Changed 3 years ago by thoughtpolice

Milestone: 7.10.17.12.1

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:28 Changed 3 years ago by thoughtpolice

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:29 Changed 22 months ago by thoughtpolice

Milestone: 7.12.18.0.1

Milestone renamed

comment:30 Changed 17 months ago by thomie

Milestone: 8.0.1
Note: See TracTickets for help on using tickets.