Opened 3 months ago

Last modified 2 months ago

#8668 new bug

SPECIALIZE silently fails to apply

Reported by: crockeea Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.6.2
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: x86_64 (amd64)
Type of failure: None/Unknown Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

I have a small example where GHC refuses to specialize a call to (+), compiling with -O2.
The two files are Foo.hs (http://lpaste.net/98464) and Main.hs (http://lpaste.net/98342).

There seem to be two problems:

  1. The active SPECIALIZE pragma should be applied, but isn't. This can be seen by comparing the core and runtimes of fcTest (slow) vs vtTest (fast). I need this version of the pragma in my real code as the phantom type m is reified, so I need to specialize the vector code without specifying the phantom type.
  1. I can get fcTest to run fast if I use the commented-out SPECIALIZE pragma instead. However, that pragma seems very straightforward to me (all types are concrete). The docs indicate that GHC should automatically specialize in most cases, why does it not specialize to the commented-out signature automatically?

This problem is also posted here: http://stackoverflow.com/questions/21071706/specialization-with-constraints

Change History (21)

comment:1 Changed 3 months ago by crockeea

To be clear: GHC generates specialized code and a rule to apply it, but the rule does not fire.

Last edited 3 months ago by crockeea (previous) (diff)

comment:2 Changed 3 months ago by simonpj

Where are runRand and liftRand defined? Hoogle can't find either. Google finds package MonadRandom on hackage, here, but I can't see liftRand.

Would it be possible to rejig the example so that it doesn't use this extra package? Makes it much easier to reproduce. You don't, presumably, actually need random numbers to demonstrate the bug.

Simon

comment:3 Changed 3 months ago by crockeea

Right you are. The links are fixed now.

comment:4 follow-up: Changed 3 months ago by simonpj

Thanks that's helpful. I can now at least compile it.

One complicating factor is that you have a Num instance for FastCyc. You could simplify the setup by calling plusFastCyc in cyclotomicTimeTest, and defining

plusFastCyc :: Num (t r) => FastCyc t r -> FastCyc t r -> FastCyc t r
plusFastCyc (PowBasis v1) (PowBasis v2) = PowBasis $ v1 + v2
plusFastCyc p1@(DecBasis _) p2@(PowBasis _) = (g p1) + p2
plusFastCyc p1@(PowBasis _) p2@(DecBasis _) = p1 + (g p2)
plusFastCyc p1 p2 = (g p1) + (g p2)

Now you can write your specilise pragmas (or not) for that. Do you get the same behaviour? That would elminate one source of complexity.

Can you say in more detail what you expect to happen? The slow version (could we call it slow rather than cyclotomicTimeTest?) iterates plusFastCyc which necessariliy does a lot more work unpacking and packing those PowBasis constructors. Are you ok with that? But you aren't ok with something.

In short, it's a bit complicated for me to understand the problem.

Maybe you can show some -ddump-simpl core and say "this call here should be specialsed, why doesn't the rule fire?

Incidentally, if you want GHC to auto-specialise an imported function, to types that may not even be in scope in the defining module, you should mark that function as INLINABLE

comment:5 Changed 3 months ago by crockeea

I updated the files again per your suggestion, and get identical behavior.

I expect the vtTest (currently fast) and fcTest (currently slow) to have the same runtime, e.g. < 2 seconds runtime difference. I do not want to inline plusFastCyc: the function is too large and used in many places so inlining everywhere would create code bloat. Thus I want GHC to specialize the call to plusFastCyc rather than inlining it.
The two functions are doing the same work, but fcTest has one more level of indirection (one more wrapper on the type, and one more function call per addition).

As far as "doing a lot of work unpacking Pow constructors", plusFastCyc is iterated 100 times, but the runtime difference is 1 minute 18 seconds. I'm willing to pay for 100 function calls, but I think we can agree that something more is going on than just unpacking constructors.

I put some core snippets here http://lpaste.net/98593
This was compiled with -O3 using the forall'd SPECIALIZATION.
On line 10, you can see that GHC does write a specialized version of plusFastCyc, compared to the generic version on line 167.
The rule for the specialization is on line 225. I believe this rule should fire on line 270. (main6 calls iterate main8 y, so main8 is where plusFastCyc should be specialized.)

In regards to GHC auto-specializing: if I do not explicitly specialize plusFastCyc at all (but still mark it as INLINABLE), fcTest is slow. If instead I specialize plusFastCyc with concrete types as in the comment, fcTest is fast. Thus it appears GHC is *not* auto-specializing plusFastCyc, despite it being marked as INLINABLE.

Last edited 3 months ago by crockeea (previous) (diff)

comment:6 follow-up: Changed 3 months ago by carter

Inlinable is known to prevent specialization from firing. (unless my recollection is wrong, has something to do with the order of the relevant passes in ghc afaik)

comment:7 in reply to: ↑ 6 Changed 3 months ago by crockeea

Replying to carter:

Inlinable is known to prevent specialization from firing. (unless my recollection is wrong, has something to do with the order of the relevant passes in ghc afaik)

I tried removing the INLINABLEs, but that didn't help. I was under the impression that at least for auto-specialization across modules, INLINABLE is required. I've also tried playing around with phase control (just a little) to avoid inlining/specialization order problems, but I couldn't get that to work either.

comment:8 Changed 3 months ago by carter

If you write the SPECIALIZE pragma instance in the defining module for the operation (assuming the associated class instances specialize too, check that they can specialize mebe?), you don't need the INLINEABLE (unless you want other instances to be specialized).

what happens when you go one step in the other direction and use INLINE?

comment:9 Changed 3 months ago by crockeea

For the small example above, GHC simply inlines everything and both main functions are equally fast.

However, I tried this in my real code and the equivalent of plusFastCyc is too large for GHC to inline (even with INLINE pragmas), and used in too many places for that to be a good solution even if it did work. That's why I'm trying to make GHC call a specialized function instead.

Last edited 3 months ago by crockeea (previous) (diff)

comment:10 follow-up: Changed 3 months ago by carter

did you try doing a specialize on

(FastCyc? (VT U.Vector m) Int) -> (FastCyc? (VT U.Vector m) Int) -> (FastCyc? (VT U.Vector m) Int)

without the class constraint?

comment:11 in reply to: ↑ 10 Changed 3 months ago by crockeea

Replying to carter:

did you try doing a specialize on

(FastCyc? (VT U.Vector m) Int) -> (FastCyc? (VT U.Vector m) Int) -> (FastCyc? (VT U.Vector m) Int)

without the class constraint?

If I remove the constraint, the specialization occurs.

comment:12 follow-up: Changed 3 months ago by carter

do you get the desired specialization behavior now?

If so, i suppose that means maybe theres need for clearer support tooling around understanding specialization pragmas?

comment:13 in reply to: ↑ 12 Changed 3 months ago by crockeea

Replying to carter:

do you get the desired specialization behavior now?

If so, i suppose that means maybe theres need for clearer support tooling around understanding specialization pragmas?

No, I need the specialization to fire with the constraint.

comment:14 Changed 3 months ago by carter

why?

comment:15 follow-up: Changed 3 months ago by carter

could you walk me through why

{-# SPECIALIZE plusFastCyc :: (FastCyc (VT U.Vector m) Int) -> (FastCyc (VT U.Vector m) Int) -> (FastCyc (VT U.Vector m) Int) #-}

isn't satisfactory?

comment:16 Changed 3 months ago by crockeea

In real code, plusFastCyc will call a function in class Factored. The function in Factored and the call to it in plusFastCyc were removed because they weren't needed to demonstrate the specialization problem.

comment:17 in reply to: ↑ 15 Changed 3 months ago by crockeea

Replying to carter:

could you walk me through why

{-# SPECIALIZE plusFastCyc :: (FastCyc (VT U.Vector m) Int) -> (FastCyc (VT U.Vector m) Int) -> (FastCyc (VT U.Vector m) Int) #-}

isn't satisfactory?

I added some more code to demonstrate something more like a real use case for the program: http://lpaste.net/98840. It has the same issue as Foo.hs above, but hopefully you can see why the constraint is needed. I wanted to remove it from the example to show that the problem wasn't type families, constraint kinds, etc.

comment:18 Changed 3 months ago by crockeea

*bump* Any ideas as to why GHC isn't applying the rule in main8?

comment:19 in reply to: ↑ 4 Changed 2 months ago by crockeea

Replying to simonpj:

Incidentally, if you want GHC to auto-specialise an imported function, to types that may not even be in scope in the defining module, you should mark that function as INLINABLE

I have been playing around with this some more, and found something interesting. As I mentioned, despite the fact that I have everything marked INLINABLE, GHC was *not* auto-specializing plusFastCyc. However, if I partially apply the call to plusFastCyc in Main to iterate (plusFastCyc y) ... instead of iterate (\x -> plusFastCyc y x) ..., GHC *does* automatically specialize the call to plusFastCyc. However, the code is still slow because (+) is still not specialized. I might expect (+) to be specialized for two reasons

  1. It is called at the top level in Main.hs in the foldl.
  2. The docs say that *when* there is an explicit pragma, specialization is transitive. I could hope that auto-specialization is also transitive. Is this the case?

Is the problem of GHC specializing the partially applied function but not the fully applied version related to ticket:8099?

comment:20 follow-up: Changed 2 months ago by carter

any function marked inlineable or inline wont get specialize., or at least i seem to recall theres some phase ordering issues related to that.

Last edited 2 months ago by carter (previous) (diff)

comment:21 in reply to: ↑ 20 Changed 2 months ago by crockeea

Replying to carter:

any function marked inlineable or inline wont get specialize., or at least i seem to recall theres some phase ordering issues related to that.

You've mentioned that before, but the docs suggest otherwise ("However if a function f is given an INLINABLE pragma at its definition site, then it can subsequently be specialised by importing modules"), as does Simon's comment above. Furthermore, removing "INLINABLE" doesn't help matters.

Sure there can be phase issues, but the point is inlining is not occurring at all, so in particular it is not happening in place of specialization.

Last edited 2 months ago by crockeea (previous) (diff)
Note: See TracTickets for help on using tickets.