Opened 2 years ago

Last modified 3 months ago

#5928 new bug

INLINABLE fails to specialize in presence of simple wrapper

Reported by: tibbe Owned by:
Priority: normal Milestone: 7.6.2
Component: Compiler Version: 7.4.1
Keywords: Cc: pho@…, fox@…, hackage.haskell.org@…, tkn.akio@…, ecrockett0@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

If a function marked as INLINABLE is called indirectly through a simple wrapper defined in a different module, specialization never happens (i.e. none of the dictionaries are removed.)

Here's an example where it fails. First, the simple wrapper module:

module Repro where

import Data.Hashable
import Data.HashMap.Strict as M

infixl 9  !
(!) :: (Eq a, Hashable a) => M.HashMap a b -> a -> b
m ! x = case M.lookup x m of  -- lookup is INLINABLE
    Just y -> y
    Nothing -> error "Repro.!"

and then the call site:

module Test (test) where

import Data.HashMap.Strict as M

import Repro

test :: M.HashMap Int Int -> Int
test m = m ! 42

To compile the code you need to cabal install unordered-containers. The relevant function (which is not getting specialized) from unordered-containers is:

lookup :: (Eq k, Hashable k) => k -> HashMap k v -> Maybe v
lookup k0 = go h0 k0 0
  where
    h0 = hash k0
    go !_ !_ !_ Empty = Nothing
    go h k _ (Leaf hx (L kx x))
        | h == hx && k == kx = Just x
        | otherwise          = Nothing
    go h k s (BitmapIndexed b v)
        | b .&. m == 0 = Nothing
        | otherwise    = go h k (s+bitsPerSubkey) (A.index v (sparseIndex b m))
      where m = mask h s
    go h k s (Full v) = go h k (s+bitsPerSubkey) (A.index v (index h s))
    go h k _ (Collision hx v)
        | h == hx   = lookupInArray k v
        | otherwise = Nothing
#if __GLASGOW_HASKELL__ >= 700
{-# INLINABLE lookup #-}
#endif

If test calls lookup directly, without using the (!) wrapper, things get specialized. Manually marking (!) as INLINABLE works, but users shouldn't have to do that.

The core for Repro and Test is:

$ ghc -O2 Test.hs -fforce-recomp -ddump-simpl 
[1 of 2] Compiling Repro            ( Repro.hs, Repro.o )

==================== Tidy Core ====================
Result size = 28

lvl_rNZ :: [GHC.Types.Char]
[GblId]
lvl_rNZ = GHC.CString.unpackCString# "Repro.!"

Repro.!1 :: forall b_aBU. b_aBU
[GblId, Str=DmdType b]
Repro.!1 = \ (@ b_aBU) -> GHC.Err.error @ b_aBU lvl_rNZ

Repro.!
  :: forall a_atJ b_atK.
     (GHC.Classes.Eq a_atJ, Data.Hashable.Hashable a_atJ) =>
     Data.HashMap.Base.HashMap a_atJ b_atK -> a_atJ -> b_atK
[GblId,
 Arity=4,
 Str=DmdType LLLL,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=4, Value=True,
         ConLike=True, Cheap=True, Expandable=True,
         Guidance=IF_ARGS [0 0 0 0] 70 0}]
Repro.! =
  \ (@ a_aBT)
    (@ b_aBU)
    ($dEq_aBV :: GHC.Classes.Eq a_aBT)
    ($dHashable_aBW :: Data.Hashable.Hashable a_aBT)
    (m_atL :: Data.HashMap.Base.HashMap a_aBT b_aBU)
    (x_atM :: a_aBT) ->
    case Data.HashMap.Base.lookup
           @ a_aBT @ b_aBU $dEq_aBV $dHashable_aBW x_atM m_atL
    of _ {
      Data.Maybe.Nothing -> Repro.!1 @ b_aBU;
      Data.Maybe.Just y_atN -> y_atN
    }



[2 of 2] Compiling Test             ( Test.hs, Test.o )

==================== Tidy Core ====================
Result size = 20

Test.test2 :: GHC.Types.Int
[GblId,
 Caf=NoCafRefs,
 Str=DmdType m,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=True,
         ConLike=True, Cheap=True, Expandable=True,
         Guidance=IF_ARGS [] 10 110}]
Test.test2 = GHC.Types.I# 42

Test.test1
  :: Data.HashMap.Base.HashMap GHC.Types.Int GHC.Types.Int
     -> Data.Maybe.Maybe GHC.Types.Int
[GblId,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
         ConLike=False, Cheap=False, Expandable=False,
         Guidance=IF_ARGS [] 40 0}]
Test.test1 =
  Data.HashMap.Base.lookup
    @ GHC.Types.Int
    @ GHC.Types.Int
    GHC.Classes.$fEqInt
    Data.Hashable.$fHashableInt
    Test.test2

Test.test
  :: Data.HashMap.Base.HashMap GHC.Types.Int GHC.Types.Int
     -> GHC.Types.Int
[GblId,
 Arity=1,
 Str=DmdType L,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=1, Value=True,
         ConLike=True, Cheap=True, Expandable=True,
         Guidance=IF_ARGS [0] 40 0}]
Test.test =
  \ (m_aPx
       :: Data.HashMap.Base.HashMap GHC.Types.Int GHC.Types.Int) ->
    case Test.test1 m_aPx of _ {
      Data.Maybe.Nothing -> Repro.!1 @ GHC.Types.Int;
      Data.Maybe.Just y_atN -> y_atN
    }

Attachments (3)

Repro.hs (215 bytes) - added by tibbe 2 years ago.
Test.hs (122 bytes) - added by tibbe 2 years ago.
0001-Add-extra-specialize-pass.patch (2.3 KB) - added by tibbe 19 months ago.

Download all attachments as: .zip

Change History (25)

Changed 2 years ago by tibbe

Changed 2 years ago by tibbe

comment:1 Changed 2 years ago by tibbe

This commit to the containers library by Milan Straka seems relevant.

Relevant part of commit message:

* In GHC, INLINing the INLINABLE method as in
    update f = updateWithKey (\_ x -> f x)
    {-# INLINE update #-}
    updateWithKey = ...
    {-# INLINABLE updateWithKey #-}
  does not produce wanted result -- the updateWithKey does not get
  specialized in the call-site module.

  The solution is to mark both functions INLINABLE.

comment:2 Changed 2 years ago by PHO

  • Cc pho@… added

comment:3 follow-up: Changed 2 years ago by simonpj

  • Difficulty set to Unknown

Manually marking (!) as INLINABLE works, but users shouldn't have to do that.

Why do you say that "users should not have to do that"? Why are you happy to annotate lookup but not (!)?

Maybe your position is:

  • every overloaded function should be INLINABLE.

If that was the rule then all overloading would be specialised, at the cost of some code duplication. But it's not the rule at the moment. I guess we could have a flag to give that behaviour -- but then you'd have to remember to use it.

comment:4 in reply to: ↑ 3 Changed 2 years ago by tibbe

Replying to simonpj:

Manually marking (!) as INLINABLE works, but users shouldn't have to do that.

Why do you say that "users should not have to do that"? Why are you happy to annotate lookup but not (!)?

I should first clarify what I meant by "user." It's fine for me, the author of a high performance library, to have to understand the finer details of GHC performance tuning, as long as that performance tuning can be hidden inside the library.

Now, should a user of my library have to understand how I use INLINABLE inside my library, or risk large performance degradations? I'd hope not, because if users need to have the same level of understanding as we who write these core libraries, everyone needs to be an expert to use the language.

I suspect my choice of name for the wrapper, (!), was a poor one, as it made it seem that it was part of the library when it was intended to represent a function defined outside the library. Here's what really happend:

  • In the unordered-containers library I defined lookup, but not (!).
  • Lennart, who was porting some code from the containers library to the unordered-containers library, defined his own (!) function, in terms of my lookup function.
  • Lennart didn't put a INLINABLE pragma on his function, not knowing that this was needed to not lose the effect of the INLINABLE pragma I put on my lookup function.
  • Things didn't specialize; performance was poor.

Maybe your position is:

  • every overloaded function should be INLINABLE.

If that was the rule then all overloading would be specialised, at the cost of some code duplication. But it's not the rule at the moment. I guess we could have a flag to give that behaviour -- but then you'd have to remember to use it.

I don't think the lack of such a rule is the problem here. Look at this core:

Test.test1 =
  Data.HashMap.Base.lookup
    @ GHC.Types.Int
    @ GHC.Types.Int
    GHC.Classes.$fEqInt
    Data.Hashable.$fHashableInt
    Test.test2

Why is lookup being passed two dictionaries when it's marked as INLINABLE? This ought to have triggered a specialization.

comment:5 Changed 2 years ago by simonpj

I think it's because (!) just happens, by coincidence, to be small enough to inline all by itself. But this inlining just happens to only occur after the specialisation pass.

And it would not happen at all if (!) had been a bigger function, which is the typical case. So, if (!) is big (or even, as we have seen, if it's small) you have to mark it INLINABLE. Or, perhaps, as I say, you want every overloaded function to be inlinable, at least if you compile with -O2 perhaps?

comment:6 Changed 2 years ago by gregorycollins

simonpj: would it make sense to run a specialization pass after every inline pass?

comment:7 Changed 2 years ago by tibbe

I agree with Greg, unless specialization is very expensive, we should do it more often.

comment:8 follow-up: Changed 2 years ago by simonpj

Maybe so, but it still depends on the fragile property that (!) is inlined.

comment:9 in reply to: ↑ 8 Changed 2 years ago by tibbe

Replying to simonpj:

Maybe so, but it still depends on the fragile property that (!) is inlined.

True, but I don't think that's such a big deal. The surprising thing here is that lookup doesn't get specialized even though (!) is inlined.

I think it's reasonable to expect users to understand that if a function is large it might not get inlined and thus some optimization might not be exposed. It might even be reasonable to expect some optimizations to not fire if you manually delay inlining to some later phase. What is a bit unreasonable is that optimizations don't happen when things are expected to (and do) inline as early as they possibly could.

comment:10 Changed 2 years ago by milan

  • Cc fox@… added

comment:11 Changed 23 months ago by tibbe

Could we at least have the option of running the specialization pass more often (e.g. as often as inlining)? I'd like to have repa use INLINABLE instead of INLINE when possible to decrease the amount of code bloat it generates and I'd be worried that because the specialization pass only runs once it might fail to trigger in important situations and the performance of the code will be unpredictable.

comment:12 Changed 22 months ago by pcapriotti

  • Milestone set to 7.6.1

comment:13 Changed 19 months ago by igloo

  • Milestone changed from 7.6.1 to 7.6.2

comment:14 Changed 19 months ago by tibbe

  • Status changed from new to patch

I've added an optional specialize pass after simplifier phases and just before spec-constr. This solves the issue in this ticket. To test compilations times I compiled the unordered-containers package with the new flag off (the default) and on. I see no changes in compilation times:

tibbe@redwood:ghc (always-specialize)$ time cabal install -w /Users/tibbe/src/ghc/inplace/bin/ghc-stage2 unordered-containers --reinstall
Resolving dependencies...
In order, the following will be installed:
unordered-containers-0.2.2.1 (reinstall)
Warning: Note that reinstalls are always dangerous. Continuing anyway...
Configuring unordered-containers-0.2.2.1...
Building unordered-containers-0.2.2.1...
Preprocessing library unordered-containers-0.2.2.1...
[1 of 8] Compiling Data.HashMap.UnsafeShift ( Data/HashMap/UnsafeShift.hs, dist/build/Data/HashMap/UnsafeShift.o )
[2 of 8] Compiling Data.HashMap.PopCount ( Data/HashMap/PopCount.hs, dist/build/Data/HashMap/PopCount.o )
[3 of 8] Compiling Data.HashMap.Unsafe ( Data/HashMap/Unsafe.hs, dist/build/Data/HashMap/Unsafe.o )
[4 of 8] Compiling Data.HashMap.Array ( Data/HashMap/Array.hs, dist/build/Data/HashMap/Array.o )
[5 of 8] Compiling Data.HashMap.Base ( Data/HashMap/Base.hs, dist/build/Data/HashMap/Base.o )
[6 of 8] Compiling Data.HashMap.Strict ( Data/HashMap/Strict.hs, dist/build/Data/HashMap/Strict.o )
[7 of 8] Compiling Data.HashMap.Lazy ( Data/HashMap/Lazy.hs, dist/build/Data/HashMap/Lazy.o )
[8 of 8] Compiling Data.HashSet     ( Data/HashSet.hs, dist/build/Data/HashSet.o )
In-place registering unordered-containers-0.2.2.1...
Running Haddock for unordered-containers-0.2.2.1...
cabal: Haddock's internal GHC version must match the configured GHC version.
The GHC version is 7.7.20120912 but haddock is using GHC version 7.4.1
Installing library in
/Users/tibbe/.cabal/lib/unordered-containers-0.2.2.1/ghc-7.7.20120912
Registering unordered-containers-0.2.2.1...
Installed unordered-containers-0.2.2.1

real	0m15.150s
user	0m14.602s
sys	0m0.497s
tibbe@redwood:ghc (always-specialize)$ time cabal install -w /Users/tibbe/src/ghc/inplace/bin/ghc-stage2 unordered-containers --ghc-option=-fspecialise-after --reinstall
Resolving dependencies...
In order, the following will be installed:
unordered-containers-0.2.2.1 (reinstall)
Warning: Note that reinstalls are always dangerous. Continuing anyway...
Configuring unordered-containers-0.2.2.1...
Building unordered-containers-0.2.2.1...
Preprocessing library unordered-containers-0.2.2.1...
[1 of 8] Compiling Data.HashMap.UnsafeShift ( Data/HashMap/UnsafeShift.hs, dist/build/Data/HashMap/UnsafeShift.o )
[2 of 8] Compiling Data.HashMap.PopCount ( Data/HashMap/PopCount.hs, dist/build/Data/HashMap/PopCount.o )
[3 of 8] Compiling Data.HashMap.Unsafe ( Data/HashMap/Unsafe.hs, dist/build/Data/HashMap/Unsafe.o )
[4 of 8] Compiling Data.HashMap.Array ( Data/HashMap/Array.hs, dist/build/Data/HashMap/Array.o )
[5 of 8] Compiling Data.HashMap.Base ( Data/HashMap/Base.hs, dist/build/Data/HashMap/Base.o )
[6 of 8] Compiling Data.HashMap.Strict ( Data/HashMap/Strict.hs, dist/build/Data/HashMap/Strict.o )
[7 of 8] Compiling Data.HashMap.Lazy ( Data/HashMap/Lazy.hs, dist/build/Data/HashMap/Lazy.o )
[8 of 8] Compiling Data.HashSet     ( Data/HashSet.hs, dist/build/Data/HashSet.o )
In-place registering unordered-containers-0.2.2.1...
Running Haddock for unordered-containers-0.2.2.1...
cabal: Haddock's internal GHC version must match the configured GHC version.
The GHC version is 7.7.20120912 but haddock is using GHC version 7.4.1
Installing library in
/Users/tibbe/.cabal/lib/unordered-containers-0.2.2.1/ghc-7.7.20120912
Registering unordered-containers-0.2.2.1...
Installed unordered-containers-0.2.2.1

real	0m15.164s
user	0m14.635s
sys	0m0.495s

No measurable change in this case.

Changed 19 months ago by tibbe

comment:15 Changed 19 months ago by tibbe

After discussing with simonpj, I tried to instead move the existing specialise pass to after the simplifier pass. This hurt performance a bit (runtime +2.6%) and resulted in way fewer specialisations (down to 136 from 230 for the whole nofib suite). Adding a second specialise pass (for a total of two) just after the main simplifier core pass increased the number of specialisations to 242, but showed no improvement in performance on nofib.

The next step is probably to check why so many fewer specialisations were created when when the single passed was moved later in the pipeline.

comment:16 Changed 17 months ago by igloo

  • Status changed from patch to new

It sounds like more investigation is needed on this.

comment:17 Changed 15 months ago by simonpj

  • Owner set to tibbe

Thanks for running with this, Johan.

comment:18 Changed 15 months ago by tibbe

  • Owner tibbe deleted

I did some work here a while ago. From what I remember, these were the results:

I tried to move the specialise pass later, after the main simplifier phases. That had a negative effect on the benchmarks I ran and also resulted in fewer specialisations, as measured by some instrumentation I added. I also tried to add a new specialise pass, again after the main simplifier phases, while leaving the original one (which runs after simpl_gently) untouched. This fixed my some specific case I had, but from what I remember it didn't get some other cases. I remember thinking that that particular approach worked well enough.

I think this needs more investigation, like trying to specialise as part of the simplifier, but I'm not working on it at the moment. I will unassign this ticket for now to reflect that. I will assign the ticket to me again if I ever pick this up again.

comment:19 Changed 7 months ago by simonpj

On an email thread Johan points out that calls to f do not lead to specialised code for g, even though f has an INLINE pragma:

f = g
{-# INLINE f #-}

g :: Hashable a => ...
g = ...
{-# INLINABLE g #-}

This was a surprise to me. And yes, it's because specialisation currently occurs before any inlning.

Currently the first "gentle" pass of simplification does no inlining. But I think that is for out-of-date reasons. If it honoured INLINE pragmas, then the above code would work just fine. But I got diverted, and the code is lying around on my disk waiting for attention. Reviving this ticket is a good incentive to get back to it.

Simon

comment:20 Changed 7 months ago by liyang

  • Cc hackage.haskell.org@… added

comment:21 Changed 5 months ago by akio

  • Cc tkn.akio@… added

comment:22 Changed 3 months ago by crockeea

  • Cc ecrockett0@… added
Note: See TracTickets for help on using tickets.