Opened 11 years ago

Last modified 6 months ago

#855 new task

Improvements to SpecConstr

Reported by: simonpj Owned by:
Priority: normal Milestone:
Component: Compiler Version: 6.4.2
Keywords: SpecConstr Cc:…, ekmett@…, mpickering
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case: N/A
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:


There are a series of possible improvemnts to SpecConstr, described in the source code. compiler/specialise/SpecConstr

  • Specialising for constant parameters
  • Specialising for lambda parameters
  • Two ideas to do with strictness that look more tricky

Some of them look quite straightforward.

Change History (9)

comment:1 Changed 11 years ago by igloo

Milestone: 6.8
Test Case: N/A

comment:2 Changed 10 years ago by simonpj

Milestone: 6.8 branch_|_

I'll look at this again when someone finds that SpecConstr really isn't doing the Right Thing for them. My guess is that this will happen either in array fusion for nested data parallelism, or in the use of stream fusion for lists (Coutts et al ICFP'07).

Meanwhile, here is the last message I had from Roman on the subject, which summarises what he did to make SpecConstr work reasonably well for their ICFP paper.

Specialising on constants

Consider the following program (admittedly contrived, but we get this kind of code with stream fusion):

lvl = Just True

foo :: Maybe Bool -> Int -> Int
foo (Just True) i = i
foo _           i = foo lvl i

SpecConstr doesn't optimise this even though it is supposed to specialise on global variables. If I understand correctly, the reason for this is the following test in argToPat:

   -- Check if the argument is a variable that
   -- is in scope at the function definition site
   -- It's worth specialising on this if
   --   (a) it's used in an interesting way in the body
   --   (b) we know what its value is
argToPat in_scope con_env (Var v) arg_occ
   | not (isLocalId v) || v `elemInScopeSet` in_scope,
     case arg_occ of { UnkOcc -> False; other -> True },        -- (a)
     isValueUnfolding (idUnfolding v)                   -- (b)
   = return (True, Var v)

The problem with this is that v might have been cloned (via extendBndr*) and in that case, it won't have an unfolding. That is precisely what happens in the above example, I believe.

I solved this by explicitly recording which binders have a known value but you might have a better idea.

The threshold

I don't like the spec-threshold idea at all. In fact, it was the first thing I had to disable for the ICFP paper since it just doesn't make sense for the kind of code stream fusion generates. It can produce some very large loops which *must* be specialised; indeed, SpecConstr will usually decrease the program size.

Specialising on lambdas

Specialising on lambdas is absolutely crucial for stream fusion (for nested lists). I completely agree that the direct approach was too fragile. What I did was a rather bad hack but it worked.

  • I extended FloatOut (or, rather, SetLevels) such that it would float out lambdas in recursive arguments, i.e., given
           foo x = ... foo (C (\x. e)) ...
    it would produce
           f a1 ... an x = e
           foo x = ... foo (C (f a1 ... an)) ...
    where a1,...,an are the free variables of e.
  • This floating would happen before every SpecConstr pass (see below).
  • I extended SpecConstr (very hackishly) to specialise on partial applications of global variables. For the above, it would produce
           foo' a1 ... an = ...
    and the rewrite rule
           forall a1 ... an. foo (C (f a1 ... an)) = foo' a1 ... an
    Now, f is used directly in foo'. We have a bad staging problem here, however: we definitely want to inline f in foo' (i.e., it should get an INLINE pragma) but not in foo (at least, not before the rule has fired). I solved this by a terrible hack involving core notes but clearly, something better is needed here. Note that we might generate several specialisations which use the same f and we want to inline the latter in every specialisation.
  • Finally, inlining f in foo' and simplifying might expose further opportunities for SpecConstr. For the ICFP paper, I implemented FloatOut/SpecConstr/Simplify loop. Stream fusion with nested lists requires as many iterations as there are nesting levels (each iteration would peel away one nesting level). Note that the current specLoop doesn't help here because we really have to do full simplification (including inlining).

That was basically all I did. We didn't really use the static argument transformation because specialising on global variables essentially has the same effect.

comment:3 Changed 9 years ago by simonmar

Architecture: UnknownUnknown/Multiple

comment:4 Changed 9 years ago by simonmar

Operating System: UnknownUnknown/Multiple

comment:5 Changed 5 years ago by morabbin

Type of failure: None/Unknown

Do we have a place for the kind of knowledge buried in simonpj's comment above? Or a ticket type? (My apologies if I could have found this info elsewhere.)

comment:6 Changed 4 years ago by liyang

Cc:… added

comment:7 Changed 4 years ago by ekmett

Cc: ekmett@… added

comment:8 Changed 20 months ago by thomie

Type of failure: None/UnknownRuntime performance bug

comment:9 Changed 6 months ago by mpickering

Cc: mpickering added
Keywords: SpecConstr added
Note: See TracTickets for help on using tickets.