Opened 5 months ago

Closed 3 months ago

#15445 closed bug (fixed)

SPECIALIZE one of two identical functions does not fire well

Reported by: nobrakal Owned by:
Priority: normal Milestone: 8.6.1
Component: Compiler Version: 8.4.3
Keywords: Cc:
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: None/Unknown Test Case: simplCore/should_compile/T15445
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Hi,

I am playing with SPECIALIZE pragma:

module Todo where

{-# SPECIALIZE plusTwoRec :: [Int] -> [Int] #-}
plusTwoRec :: Num a => [a] -> [a]
plusTwoRec [] = []
plusTwoRec (x:xs) = x+2:plusTwoRec xs

plusTwoRec' :: Num a => [a] -> [a]
plusTwoRec' [] = []
plusTwoRec' (x:xs) = x+2:plusTwoRec' xs

And wanted to benchmark it with (in Main.hs):

import Todo
import Criterion.Main

aListOfInt :: [Int]
aListOfInt = [1..10000]

main :: IO ()
main = defaultMain
  [ bench "plusTwoRec" $ nf plusTwoRec aListOfInt
  , bench "plusTwoRec'" $  nf plusTwoRec' aListOfInt
  ]

Sadly, the rule of specialization of plusTwoRec does not fire in Main.hs (I compiled with: ghc Main.hs -O -dynamic -ddump-rule-firings (the -dynamic part is due to my ArchLinux installaltion)).

The result is:

[1 of 2] Compiling Todo             ( Todo.hs, Todo.o )
Rule fired: Class op + (BUILTIN)
Rule fired: Class op fromInteger (BUILTIN)
Rule fired: integerToInt (BUILTIN)
Rule fired: SPEC plusTwoRec (Todo)
[2 of 2] Compiling Main             ( Main.hs, Main.o )
Rule fired: Class op enumFromTo (BUILTIN)
Rule fired: unpack (GHC.Base)
Rule fired: unpack (GHC.Base)
Rule fired: eftIntList (GHC.Enum)
Rule fired: unpack-list (GHC.Base)
Rule fired: unpack-list (GHC.Base)
Linking Main ...

I have inspected a bit the code produced after the simplifications passes (with -ddump-simpl) and here is the suspicious part:

plusTwoRec :: forall a. Num a => [a] -> [a]
[GblId,
 Arity=2,
 Caf=NoCafRefs,
 Str=<L,U(C(C1(U)),A,A,A,A,A,C(U))><S,1*U>,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
         WorkFree=True, Expandable=True,
         Guidance=ALWAYS_IF(arity=0,unsat_ok=True,boring_ok=True)}]
plusTwoRec = plusTwoRec'

I believe that plusTwoRec is inlined before the specialization has a chance to fire, but I am not sure at all !

Separating the two functions definitions in two different files works.

So I don't know if this is a GHC bug, myself that does not read the right part of the GHC manual, if it is only a lack of documentation, or anything else.

Change History (8)

comment:1 Changed 5 months ago by nobrakal

Summary: SPCIALIZE one of two identical functions does not fire wellSPECIALIZE one of two identical functions does not fire well

comment:2 Changed 5 months ago by simonpj

Aha. The problem is that we get

RULES: "SPEC plusTwoRec" forall ($dNum :: Num Int).
                         plusTwoRec @ Int $dNum = plusTwoRec_$splusTwoRec

but, as you say, Common Subexpression Elimination has decided to replace plusTwoRec's RHS with just plusTwoRec'. This is basically a good thing to do (saves code duplication), but if plusTwoRec is inlined before the rule has a chance to fire, we'll miss the specialisation.

I thought of disabling CSE for functions with RULES; but that seems wrong. Generally, we add a NOINLINE pragma to a function with RULES to ensure that the function does not inline before the rule has a chance to fire. I think we should do the same thing with these auto-generated RULES from specialisations.

Patch coming.

comment:3 Changed 5 months ago by Simon Peyton Jones <simonpj@…>

In 2110738b/ghc:

Don't inline functions with RULES too early

Trac #15445 showed that a function with an automatically
generated specialisation RULE coudl be inlined before the
RULE had a chance to fire.

This patch attaches a NOINLINE[2] activation to the Id, to
stop this happening.

comment:4 Changed 5 months ago by simonpj

Resolution: fixed
Status: newclosed
Test Case: simplCore/should_compile/T15445

Fixed! Thank you for the bug report.

comment:5 Changed 3 months ago by simonpj

Resolution: fixed
Status: closednew

comment:6 Changed 3 months ago by simonpj

Ben had to revert this patch

Revert "Don't inline functions with RULES too early"
author	Ben Gamari <ben@smart-cactus.org>	
	Wed, 1 Aug 2018 11:42:19 +0100 (06:42 -0400)
committer	Ben Gamari <ben@smart-cactus.org>	
	Wed, 1 Aug 2018 11:54:23 +0100 (06:54 -0400)
This commit causes significant performance regressions:
```
bytes allocated value is too high:
    Expected    T9872d(normal) bytes allocated: 578498120 +/-5%
    Lower bound T9872d(normal) bytes allocated: 549573214
    Upper bound T9872d(normal) bytes allocated: 607423026
    Actual      T9872d(normal) bytes allocated: 677179968
    Deviation   T9872d(normal) bytes allocated:      17.1 %
bytes allocated value is too high:
    Expected    T9872c(normal) bytes allocated: 3096670112 +/-5%
    Lower bound T9872c(normal) bytes allocated: 2941836606
    Upper bound T9872c(normal) bytes allocated: 3251503618
    Actual      T9872c(normal) bytes allocated: 3601872536
    Deviation   T9872c(normal) bytes allocated:       16.3 %
bytes allocated value is too high:
    Expected    T9872b(normal) bytes allocated: 3730686224 +/-5%
    Lower bound T9872b(normal) bytes allocated: 3544151912
    Upper bound T9872b(normal) bytes allocated: 3917220536
    Actual      T9872b(normal) bytes allocated: 4374298272
    Deviation   T9872b(normal) bytes allocated:       17.3 %
bytes allocated value is too high:
    Expected    T9872a(normal) bytes allocated: 2729927408 +/-5%
    Lower bound T9872a(normal) bytes allocated: 2593431037
    Upper bound T9872a(normal) bytes allocated: 2866423779
    Actual      T9872a(normal) bytes allocated: 3225788896
    Deviation   T9872a(normal) bytes allocated:       18.2 %
```
It's not clear that this was intentional so I'm going to revert for now.

This reverts commit 2110738b280543698407924a16ac92b6d804dc36.

comment:7 Changed 3 months ago by Simon Peyton Jones <simonpj@…>

In 3addf72/ghc:

Preserve specialisations despite CSE

Trac #15445 showed that, as a result of CSE, a function with an
automatically generated specialisation RULE could be inlined
before the RULE had a chance to fire.

This patch attaches a NOINLINE[2] activation to the Id, during
CSE, to stop this happening.

See Note [Delay inlining after CSE]

---- Historical note ---

This patch is simpler and more direct than an earlier
version:

  commit 2110738b280543698407924a16ac92b6d804dc36
  Author: Simon Peyton Jones <simonpj@microsoft.com>
  Date:   Mon Jul 30 13:43:56 2018 +0100

  Don't inline functions with RULES too early

We had to revert this patch because it made GHC itself slower.

Why? It delayed inlining of /all/ functions with RULES, and that was
very bad in TcFlatten.flatten_ty_con_app

* It delayed inlining of liftM
* That delayed the unravelling of the recursion in some dictionary
  bindings.
* That delayed some eta expansion, leaving
     flatten_ty_con_app = \x y. let <stuff> in \z. blah
* That allowed the float-out pass to put sguff between
  the \y and \z.
* And that permanently stopped eta expasion of the function,
  even once <stuff> was simplified.

-- End of historical note ---

comment:8 Changed 3 months ago by simonpj

Resolution: fixed
Status: newclosed

I managed to re-instate this.

Note: See TracTickets for help on using tickets.