Opened 3 years ago

Closed 3 years ago

Last modified 14 months ago

#5113 closed bug (fixed)

Huge performance regression of 7.0.2, 7.0.3 and HEAD over 7.0.1 and 6.12 (MonoLocalBinds)

Reported by: daniel.is.fischer Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.0.3
Keywords: performance, MonoLocalBinds Cc:
Operating System: Linux Architecture: x86
Type of failure: Runtime performance bug Difficulty: Unknown
Test Case: perf/should_run/T5113 Blocked By:
Blocking: Related Tickets:

Description

Exploring ways to make show for Double and Float faster, I encountered a terrible performance regression. Time increased about 20-fold, allocation about 75-fold.

With 6.12.3 and 7.0.1, the new code is nearly twice as fast as the old and allocates less than a third. With 7.0.2, 7.0.3 and the HEAD (7.1.20110329), the new code is over ten times slower than the old and allocates more than twenty times as much.

In 7.0.3's core at least some local bindings appear as polymorphic functions which are inlined by 7.0.1. And indeed, compiling with -XMonoLocalBinds rectifies matters. Nevertheless, that shouldn't happen.

Attachments (2)

Test.hs (779 bytes) - added by daniel.is.fischer 3 years ago.
small example of related behaviour of 7.0.3
display.tar.gz (4.6 KB) - added by daniel.is.fischer 3 years ago.
bundle exhibiting example of undesirable behaviour

Download all attachments as: .zip

Change History (8)

comment:1 Changed 3 years ago by daniel.is.fischer

Namely, the culprit is

inc i = unsafeRead darr i >>= \k -> unsafeWrite darr i (k+1)

darr is an STUArray s Int Int, yet without MonoLocalBinds? or a type signature, 7.0.3 produces

let {
   $winc_s21z
     :: forall (m_a1mK :: * -> *).
        GHC.Base.Monad m_a1mK =>
        (forall i_a1my.
         GHC.Arr.Ix i_a1my =>
         Data.Array.Base.STUArray
           s_a1ln i_a1my GHC.Types.Int
         -> GHC.Types.Int
         -> m_a1mK GHC.Types.Int)
        -> (forall i_a1mk.
            GHC.Arr.Ix i_a1mk =>
            Data.Array.Base.STUArray
              s_a1ln i_a1mk GHC.Types.Int
            -> GHC.Types.Int
            -> GHC.Types.Int
            -> m_a1mK ())
        -> GHC.Types.Int
        -> m_a1mK ()
   [LclId, Arity=4, Str=DmdType SLLL]
   $winc_s21z =
     \ (@ m_a1mK::* -> *)
       (ww3_s1YM :: GHC.Base.Monad m_a1mK)
       (ww4_s1YS
          :: forall i_a1my.
             GHC.Arr.Ix i_a1my =>
             Data.Array.Base.STUArray
               s_a1ln i_a1my GHC.Types.Int
             -> GHC.Types.Int
             -> m_a1mK GHC.Types.Int)
       (ww5_s1YT
          :: forall i_a1mk.
             GHC.Arr.Ix i_a1mk =>
             Data.Array.Base.STUArray
               s_a1ln i_a1mk GHC.Types.Int
             -> GHC.Types.Int
             -> GHC.Types.Int
             -> m_a1mK ())
       (w1_s1YV :: GHC.Types.Int) ->
       GHC.Base.>>=
         @ m_a1mK
         ww3_s1YM
         @ GHC.Types.Int
         @ ()
         (ww4_s1YS
            @ GHC.Types.Int
            GHC.Arr.$fIxInt
            marr_s21x
            w1_s1YV)
         (\ (k_a169 :: GHC.Types.Int) ->
            ww5_s1YT
              @ GHC.Types.Int
              GHC.Arr.$fIxInt
              marr_s21x
              w1_s1YV
              (case k_a169 of _ { GHC.Types.I# x1_X1G6 ->
               GHC.Types.I# (GHC.Prim.+# x1_X1G6 1)
               })) }

Give that a type signature inc :: Int -> ST s () (requires moving stuff outside the runST) and you're good again.

comment:2 Changed 3 years ago by simonpj

I'm lost. The inc you give above is indeed polymorphic in both the monad and the index type, and (unless specialised or inlined) will probably perform much worse than the polymorphic vresion.

Can you give a concrete module that, when compiled, gives much worse code with HEAD than 6.12?

Thanks

Simon

comment:3 Changed 3 years ago by daniel.is.fischer

Yes, it is polymorphic, but it's a local binding inside a runST, only ever used at the one specific type coming from the enclosing ST-action. Up to and including 7.0.1, it was automatically inlined, resulting in

case GHC.Prim.readIntArray#
       @ s_aKv marr#_aLG sc1_sSa sc_sS9
of _ { (# s2#1_aPz, e#_aPA #) ->
case GHC.Prim.writeIntArray#
       @ s_aKv
       marr#_aLG
       sc1_sSa
       (GHC.Prim.+# e#_aPA 1)
       s2#1_aPz
of s2#2_aQ2 { __DEFAULT ->

at the appropriate places. 7.0.2, 7.0.3 and HEAD (29.3. - I've not yet built a newer HEAD to see whether it changed) produce the polymorphic worker that's then called.

As for specific modules, I've not yet managed to produce a small example exhibiting exactly that, but for the small Test.hs module, 7.0.3 produces bad (polymorphic) core while HEAD (and 7.0.1) do fine. However, for the display bundle (the interesting module is DispFloat), HEAD also produces the polymorphic $winc.

Changed 3 years ago by daniel.is.fischer

small example of related behaviour of 7.0.3

Changed 3 years ago by daniel.is.fischer

bundle exhibiting example of undesirable behaviour

comment:4 Changed 3 years ago by simonpj

  • Resolution set to fixed
  • Status changed from new to closed
  • Test Case set to perf/should_run/T5113

The HEAD seems fine with your Test file, which is good. I've turned it into a performance test so we'll see if it ever goes bad on us.

Simon

comment:5 Changed 14 months ago by simonpj@…

commit 33683ba9367e03b6b051f1c0c694fd8bf244a759

Author: Simon Peyton Jones <simonpj@microsoft.com>
Date:   Mon Feb 11 08:38:12 2013 +0000

    Extra comment about the fix to Trac #5113

 compiler/specialise/Specialise.lhs |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

comment:6 Changed 14 months ago by simonpj

  • Difficulty set to Unknown

This patch is the crucial one that fixes the original problem

commit b5c18c91da911a7729563207c7b95f7e452cca7e
Author: Simon Peyton Jones <simonpj@microsoft.com>
Date:   Fri Feb 8 17:29:40 2013 +0000

    Fix an old and egregious specialisation bug (Trac #5113)
    
    The specialiser needs to know if a dictionay has some structure,
    so that it can decide whether to specialise a function. Eg
     (A)    let d = $dfblah d1
            in ....(f d)....
    
     (B)    \d. ....(f d)....
    
    In (A) it's probably worth specialising f; in (B) it isn't.
    Previously we were relying on d's unfolding, but the specialiser
    does cloning as it goes, which discards the unfolding. So we
    were simply discarding all specialisations for functions with
    local dictionary bindings!  This bug seems to have been there
    for a long time.
    
    This is what originally caused Trac #5113.  Then we went through a phase
    where local bindings were not generalised, and that meant there was
    no locally overloaded f to specialise; so the performance problem appeared
    to be fixed.  But now we are generalising local bindings again, so it
    re-appeared.
    
    This patch fixes the original problem.

 compiler/specialise/Specialise.lhs |  390 ++++++++++++++++++++----------------
 1 files changed, 214 insertions(+), 176 deletions(-)
Note: See TracTickets for help on using tickets.