Opened 6 years ago

Closed 6 years ago

Last modified 5 years ago

#2884 closed bug (fixed)

Compiled code performance worsens when module names are long enough

Reported by: jcpetruzza Owned by: igloo
Priority: normal Milestone: 6.12 branch
Component: Compiler Version: 6.10.1
Keywords: Cc: bos@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Revisions:

Description

Attached to this report is an example where by simply renaming a module, performance degrades 2.5 times.

#diff long-modname-ver.hs short-modname-ver.hs
2c2
< import VeryLongModuleName
---
> import ShortM

#diff VeryLongModuleName.hs ShortM.hs
1c1
< module VeryLongModuleName
---
> module ShortM

#ghc --make -O2 -Wall long-modname-ver.hs

#ghc --make -O2 -Wall short-modname-ver.hs

#time -p ./long-modname-ver > /dev/null
real 55.90
user 55.17
sys 0.51

#time -p ./short-modname-ver > /dev/null
real 22.23
user 21.97
sys 0.10

According to some measures by dons, the threshold seems to be at module length 10 (see attached graph).

Some further disussion on this thread http://thread.gmane.org/gmane.comp.lang.haskell.glasgow.user/16037.

Attachments (6)

long-modname-ver.hs (17.1 KB) - added by jcpetruzza 6 years ago.
VeryLongModuleName.hs (2.1 KB) - added by jcpetruzza 6 years ago.
short-modname-ver.hs (17.0 KB) - added by jcpetruzza 6 years ago.
ShortM.hs (2.0 KB) - added by jcpetruzza 6 years ago.
results.png (13.1 KB) - added by jcpetruzza 6 years ago.
Running time as a function of module name length
asm.diff (65.0 KB) - added by jcpetruzza 6 years ago.
diff of the assembly generated for both programs

Download all attachments as: .zip

Change History (12)

Changed 6 years ago by jcpetruzza

Changed 6 years ago by jcpetruzza

Changed 6 years ago by jcpetruzza

Changed 6 years ago by jcpetruzza

Changed 6 years ago by jcpetruzza

Running time as a function of module name length

Changed 6 years ago by jcpetruzza

diff of the assembly generated for both programs

comment:1 Changed 6 years ago by simonmar

  • difficulty set to Unknown
  • Milestone set to 6.12 branch
  • Type changed from bug to run-time performance bug

I think this is another example that we can trace back to optimisation problems with record selectors. If we look at the Core for nodeFor:

VeryLongModuleName.nodeFor :: GHC.Types.Int
                              -> VeryLongModuleName.T
                              -> VeryLongModuleName.T
[GlobalId]
[Arity 2
 Str: DmdType LS]
VeryLongModuleName.nodeFor =
  \ (ds_dhX :: GHC.Types.Int) (ds1_dhY :: VeryLongModuleName.T) ->
    case ds1_dhY of wild_B1 {
      VeryLongModuleName.Nil -> VeryLongModuleName.Nil;
      VeryLongModuleName.Node ipv_ski ipv1_skj ipv2_skk ipv3_skl ->
        case ds_dhX of wild1_alg { GHC.Types.I# x#_ali ->
        case VeryLongModuleName.val wild_B1
        of wild11_alk { GHC.Types.I# y#_alm ->
        case GHC.Prim.<# x#_ali y#_alm of wild2_alq {
          GHC.Bool.False ->
            case GHC.Prim.==# x#_ali y#_alm of wild12_alt {
              GHC.Bool.False -> VeryLongModuleName.right wild_B1;
              GHC.Bool.True -> wild_B1
            };
          GHC.Bool.True -> VeryLongModuleName.left wild_B1
        }
        }
        }
    }

The record selector VeryLongModuleName.val has not been inlined, whereas in the ShortM version it has:

ShortM.nodeFor :: GHC.Types.Int -> ShortM.T -> ShortM.T
[GlobalId]
[Arity 2
 NoCafRefs
 Str: DmdType LS]
ShortM.nodeFor =
  \ (ds_di1 :: GHC.Types.Int) (ds1_di2 :: ShortM.T) ->
    case ds1_di2 of wild_B1 {
      ShortM.Nil -> ShortM.Nil;
      ShortM.Node ipv_skm ipv1_skn ipv2_sko ipv3_skp ->
        case ds_di1 of wild1_alk { GHC.Types.I# x#_alm ->
        case GHC.Prim.<# x#_alm ipv_skm of wild2_alu {
          GHC.Bool.False ->
            case GHC.Prim.==# x#_alm ipv_skm of wild11_alx {
              GHC.Bool.False -> ipv2_sko; GHC.Bool.True -> wild_B1
            };
          GHC.Bool.True -> ipv1_skn
        }
        }
    }

This is almost certainly because the length of the error string pushes the definition of val just over the size limit for inlining (in this case the error string will disappear after inlining, but the inliner isn't clever enough to know this).

What's more, this wouldn't be a problem if the inliner was using the optimised definition of the record selector, in which the error string had been pulled out to the top level.

We've encountered a few cases like this recently - I sent one to Simon a few weeks ago (I don't think I ticketed it, though).

comment:2 Changed 6 years ago by simonmar

Oh, and I should say that the effect is so extreme in this case because the same problem applies to all instances of all 4 record selectors for the T type. The program will be doing a lot of repeated evaluations at runtime. You could probably work around it by defining your own record selectors instead of using the automatically generated ones.

comment:3 Changed 6 years ago by bos

  • Cc bos@… added

comment:4 Changed 6 years ago by simonpj

  • Owner set to igloo

OK I claim I've fixed this: see #2670. Give it a whirl.

Reassigning to Ian, just for independent verification; then close.

Simon

comment:5 Changed 6 years ago by igloo

  • Resolution set to fixed
  • Status changed from new to closed

Looks good:

./long-modname-ver > /dev/null  25.63s user 0.02s system 99% cpu 25.664 total
./short-modname-ver > /dev/null  25.49s user 0.01s system 100% cpu 25.488 total
./long-modname-ver > /dev/null  25.33s user 0.02s system 100% cpu 25.349 total
./short-modname-ver > /dev/null  25.29s user 0.02s system 99% cpu 25.323 total

comment:6 Changed 5 years ago by simonmar

  • Type of failure set to Runtime performance bug
Note: See TracTickets for help on using tickets.