Opened 3 years ago

Last modified 13 months ago

#9374 new task

Investigate Static Argument Transformation

Reported by: jstolarek Owned by:
Priority: lowest Milestone:
Component: Compiler Version: 7.9
Keywords: Cc: nicolas.frisby@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description (last modified by jstolarek)

At the moment the Static Argument Transformation (SAT) optimisation is not run unless explicitly enabled with -fstatic-argument-transformation. There was a comment by Max Bolingbroke in DynFlags that said this (that comment is now removed and replaced with a reference to this ticket):

Max writes: I think it's probably best not to enable SAT with -O2 for the
6.10 release. The version of SAT in HEAD at the moment doesn't incorporate
several improvements to the heuristics, and I'm concerned that without
those changes SAT will interfere with some attempts to write "high
performance Haskell", as we saw in some posts on Haskell-Cafe earlier
this year. In particular, the version in HEAD lacks the tail call
criterion, so many things that look like reasonable loops will be
turned into functions with extra (unneccesary) thunk creation.

6.10 was a long time ago. Has anything changed since then? Does it make sense to enable that optimisation now? What are the mentioned heuristics and were they finally implemented? Does anyone know what Haskell-cafe posts does Max refer to?

Change History (15)

comment:1 Changed 3 years ago by simonpj

Cc: nicolas.frisby@… added

That would be great. Quick thoughts

  • Also worth reading: Danvy et al's paper about "lambda-dropping"
  • The places it got worse might well be fixed by Nick Frisby's work on "late lambda lifting". Sadly, this has stalled a bit because Nick has got interested in other things. I'm copying him in the hope of an update.

It would be great to re-instate SAT. But it would need someone to do some careful benchmarking to make sure that it yielded a net benefit. What usually happens is that you find a couple of cases where it makes things worse, investigate, find why, and modify the transformation to avoid them.


comment:2 Changed 3 years ago by simonpj

I found an email summarising a conversation Max and I had about SAT, back in 2009. Here it is verbatim, FWIW:

SAT can lose; eg

  • few static args (can spot)
  • few loop iterations (hard to spot)
  • SAT changes SpecConstr opportunity into CaseLib opportunity; and the latter is not good at the moment
  • SATing exposes new loop-invariant sub-expressions e.g.
    f x y = if y==0 then 0 else g x + f x (y-1)
    That is potentially good. But float-out can be bad for cold branches, and this can be bad.

Hence want to focus on SAT in situations where it’s most likely to benefit.

SAT has benefits of two kinds:

  1. More efficient loops, when there are a lot of static args. This is a pretty small effect.
  2. Specialisation from (a) SAT makes recursive function non-rec; (b) then inlining can happen. This is the big gain.

Evidence for (2): most benefits happen even if you SAT *only* if at least one SAT-d argument has function type.

Four exceptions, where SATing non-function-typed args was worthwhile:

  • Atom. f x y = (x,y) : f x y. Satting this makes an loopy list; result 97% reduction in runtime.
  • SAT may make a function small enough to fit inside inlining threshold.
  • Puzzle. Recursive SAT-able function called just once outside. After SAT, it can be inlined, and hence specialised for the (single) call site.

One program got worse when SATing non-function-typed args

  • Listcompr. The cold-branch problem happens.


  • No point in SAT if function is big. Maybe SAT only if EITHER small OR single call site outside.
  • Max’s idea:
    • Lambda lift everything
    • SpecConstr
    • Lambda-drop
    • Export unfoldings
    • Selectively lambda-lift (lambda-lift functions of only one argument) [SLPJ: this is Nick Frisby's work]
    • Codegen

comment:3 Changed 3 years ago by nfrisby

Hi Simon, Jan. Thanks for including my on these recent ticket comments. I apologize for the stagnant branch.

I've merged a recent master into my local copy of the late-lam-lift branch.

1) I believe I've successfully completed the merge. There was one subtle bug that I recently caught though, so I'm concerned.

2) My push to origin/late-lam-lift is being rejected due to some submodule stuff I don't grok. I need to find the correct ghc-devs email with the right conjury in it. (I have also emailed Austin and hvr asking them to move the branch to the wip/ folder so it's less restricted.)

1 and 2 are done. The rest are my current plan, including longer-term things.

3) In light of the host of changes that got merged in, I'm developing the status notes from scratch. These are destined for a web page. The meat will focus on each flag (there are several), what it does, and why. Most of the flags are for developers/power-users only.

4) I will simplify/clean-up the code itself.

5) I will add tests for when the late lambda float is enabled. (Is the test-suite configurable for this sort of predicate?)

After 3--5 are complete, someone else could pick-up the work. I hesitate to estimate a timeline because I'm moving across the country in the next week. My goal is by mid-August.

6) My partial paper draft is still on a borked laptop's hard drive. Once the code and notes are in a state where someone can pick them up, I'm going to move the paper back on to my todo list, but with less immediacy. If someone beats me to the write-up, I'll happily hand over what I have.

comment:4 Changed 3 years ago by nfrisby

There are several concepts in your quote from Max that are not familiar to me; I can't immediately predict much accurately about the late lambda float's interaction with SAT.

I'd be happy to collaborate with anyone who would like to investigate.

comment:5 Changed 3 years ago by simonpj

Nick: mid August would be great! The surrounding documentation and explanation (even paper) is important. The job turned out to be trickier than I expected when you started. It took weeks to work out why a handful of programs were getting worse and to devise analysis/heuristics to avoid them. I don't want someone else to have to re-invent that knowledge, and it's very hard to reverse-engineer it from the code.

Moreover, I remain hopeful that, once written down, someone may see a way to achieve the goal with less trickiness.



comment:6 Changed 3 years ago by nfrisby

I merged my work with a recent master. See the wip/llf branch and the (growing) notes at LateLamLift.

Please email me with any questions you have.

comment:7 Changed 3 years ago by simonpj

Excellent work, thank you.

This ticket is about SAT, so I've started a new ticket, #9476, to track progress on late lambda lifting. Let's move further discussion about LLF to there.


comment:8 Changed 2 years ago by simonpj

When fixing this ticket, do #9561 at the same time.

comment:9 Changed 2 years ago by dfeuer

This issue (if it's an issue) isn't with SAT, but may affect it: Sometimes Rec groups include things they don't (seem to) need to, so the the current SAT will see "mutual recursion" when there's only self recursion, and SAT won't be applied. Should I open a separate ticket to investigate/fix this, or is it desirable for some reason?

comment:10 Changed 2 years ago by simonpj

I can't tell without a concrete example. Can you offer one? Thanks.


comment:11 in reply to:  10 Changed 2 years ago by dfeuer

Replying to simonpj:

I can't tell without a concrete example. Can you offer one? Thanks.


Not yet. Dan Doel's work on Takano Akio's foldrW/buildW fusion stuff (with some help from me) shows that it often creates static arguments. It also sometimes leads to apparently-splittable Rec groups. I have not yet seen it create a splittable Rec group in which it introduces a static argument. We still don't know if this fusion framework is viable as a replacement for classic foldr/build, but it seems likely that giving it a really fair performance test would require a late SAT run of some kind. It may be that there won't be splittable-Rec issues, but I figured I'd mention that possibility.

comment:12 Changed 2 years ago by dfeuer

As I mentioned elsewhere, one situation where SAT seems always good is when the static arg is actually available in an outer scope or even statically known. I have seen things that look approximately like this after simplification:

fixedError = error "Bad things!"

wfoo x y z = case blah of
   p1 -> e1
   p2 -> wfoo e2 e3 z
   _  -> z

foo x y = wfoo x y fixedError

It would seem in this case that instead of having a static argument in place of a closure variable, there's a static argument that just doesn't need to be there at all.

comment:13 Changed 2 years ago by simonpj

Here's an example where SAT would be a big win. See email thread.

scanr =
  \ @ a_akP @ a1_akQ f_ah8 b_ah9 eta_B1 ->
    letrec {
      go_amb =
        \ ds_amc eta1_Xa eta2_B2 eta3_Xc ->
          case ds_amc of _ {
            [] -> eta1_Xa eta2_B2 eta3_Xc;
            : y_amh ys_ami ->
                (\ x_an9 eta4_Xj ->
                   let {
                     b'_ana = f_ah8 y_amh x_an9 } in
                   eta1_Xa b'_ana (: b'_ana eta4_Xj))
          }; } in
    go_amb eta_B1 scanr1 b_ah9 (: b_ah9 ([]))

The 3rd and 4th arg to go are static, and since there is only one call to go, doing SAT would yield much better code by specialising go for the particular parameters:

scanr =
  \ @ a_akP @ a1_akQ f_ah8 b_ah9 eta_B1 ->
    let {
      listend = : b_ah9 ([])} in
    letrec {
      go_amb =
        \ ds_amc eta1_Xa  ->
          case ds_amc of _ {
            [] -> eta1_Xa b_ah9 listend;
            : y_amh ys_ami ->
                (\ x_an9 eta4_Xj ->
                   let {
                     b'_ana = f_ah8 y_amh x_an9 } in
                   eta1_Xa b'_ana (: b'_ana eta4_Xj))
          }; } in
    go_amb eta_B1 scanr1

comment:14 Changed 2 years ago by jstolarek

Description: modified (diff)

comment:15 Changed 13 months ago by thomie

Type of failure: None/UnknownRuntime performance bug
Note: See TracTickets for help on using tickets.