Opened 2 years ago

Last modified 4 months ago

#5954 new bug

Performance regression 7.0 -> 7.2 (still in 7.4)

Reported by: simonmar Owned by: simonpj
Priority: high Milestone: 7.6.2
Component: Compiler Version: 7.4.1
Keywords: Cc: buecking@…, conrad@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

The program in nofib/parallel/blackscholes regressed quite badly in performance between 7.0.x and 7.2.1. This is just sequential performance, no parallelism.

With 7.0:

   3,084,786,008 bytes allocated in the heap
       5,150,592 bytes copied during GC
      33,741,048 bytes maximum residency (7 sample(s))
       1,541,904 bytes maximum slop
              64 MB total memory in use (2 MB lost due to fragmentation)

  Generation 0:  5760 collections,     0 parallel,  0.08s,  0.08s elapsed
  Generation 1:     7 collections,     0 parallel,  0.01s,  0.01s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time   17.43s  ( 17.47s elapsed)
  GC    time    0.09s  (  0.09s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   17.53s  ( 17.56s elapsed)

With 7.2.2:

   3,062,127,752 bytes allocated in the heap
       4,714,784 bytes copied during GC
      34,370,232 bytes maximum residency (7 sample(s))
       1,553,968 bytes maximum slop
              64 MB total memory in use (2 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      5781 colls,     0 par    0.08s    0.08s     0.0000s    0.0006s
  Gen  1         7 colls,     0 par    0.01s    0.01s     0.0014s    0.0017s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   23.93s  ( 23.93s elapsed)
  GC      time    0.09s  (  0.09s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time   24.02s  ( 24.03s elapsed)

and with 7.4.1:

   3,061,924,144 bytes allocated in the heap
       4,733,760 bytes copied during GC
      34,210,896 bytes maximum residency (7 sample(s))
       1,552,640 bytes maximum slop
              64 MB total memory in use (2 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      5781 colls,     0 par    0.08s    0.08s     0.0000s    0.0007s
  Gen  1         7 colls,     0 par    0.01s    0.01s     0.0015s    0.0017s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   23.90s  ( 23.91s elapsed)
  GC      time    0.09s  (  0.09s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time   24.00s  ( 24.00s elapsed)

Change History (11)

comment:1 Changed 2 years ago by simonpj

Strange. No GC, no change in allocation. So where is the time going?

comment:2 Changed 2 years ago by simonmar

It looks like the simplifer has duplicated some primops. The program is making more calls to exp() at runtime than it was with 7.0. I've spent some time peering at the Core but it's quite complicated, and I haven't been able to narrow down the exact bit of problematic code yet. I suggest we look at it together when you have a chance.

(strong possibility that this is the same as #5623)

comment:3 Changed 2 years ago by simonmar

  • Owner set to simonpj

Assigning to simonpj as this is probably linked to #5623, which Simon is working on.

comment:4 follow-up: Changed 2 years ago by simonmar

Further on this: we discovered that there are two things going on, neither of which is #5623.

  • Some fusion isn't happening on a map f [x,y..z] because the list is being floated out before the RULE can fire. Simon is looking into declaring enumFromTo and friends as CONLIKE, which might fix this problem. (this regression first appeared in 7.0; the fusion did happen in 6.12.3)
  • There is an opportunity for CSE which 7.0 spots but later versions don't. It is probably a bug in blackscholes.hs itself:
   nofXd1 = cndf xD1 
   nofXd2 = cndf xD1    

Regardless, we don't know why CSE is not catching this any more, and we plan to look into it.

comment:5 in reply to: ↑ 4 ; follow-up: Changed 2 years ago by michalt

Replying to simonmar:

There is an opportunity for CSE which 7.0 spots but later versions don't. It is probably a bug in blackscholes.hs itself:

   nofXd1 = cndf xD1 
   nofXd2 = cndf xD1    

Regardless, we don't know why CSE is not catching this any more, and we plan to look into it.

I don't have much time right now to look into details, but can this be related to #5996?

comment:6 in reply to: ↑ 5 Changed 2 years ago by simonmar

Replying to michalt:

I don't have much time right now to look into details, but can this be related to #5996?

It seems plausible, but I don't know for sure.

comment:7 Changed 23 months ago by igloo

  • Milestone changed from 7.4.2 to 7.4.3

comment:8 Changed 20 months ago by buecking

  • Cc buecking@… added

comment:9 Changed 20 months ago by igloo

  • Milestone changed from 7.4.3 to 7.6.2

comment:10 Changed 19 months ago by conrad

  • Cc conrad@… added

comment:11 Changed 4 months ago by carter

is this bug still going on in head?

Note: See TracTickets for help on using tickets.