Opened 18 months ago

Closed 4 months ago

# Float exit paths out of recursive functions

Reported by: Owned by: nomeata normal Compiler 8.2.1 JoinPoints Unknown/Multiple Unknown/Multiple None/Unknown #14137 #10918

This is a spin off of a discussion from #14137.

## The problem

We generally avoid inlining or floating a binding into a recursive function, because we do not want to duplicat work/allocations.

But sometimes the binding is only used inside the recursive function on the “exit path”. In which case it would be good to inline. Example:

```  let x = f x0
in let go 10 = h x
go i = go (i+1)
in go (0::Int) + 100
```

It would be beneficial to inline `x`.

The problem is that it is not very obvious that this occurence of `x` is ok for inlining. In particular, it is not syntactically visible.

## Proposed solution

If we apply loopification (#14068), this would turn into

```  let x = f x0
in let go n =
joinrec jgo 10 = h x
jgo i = call jgo (i+1)
in jump jgo n
in go (0::Int) + 100
```

I’d like to call this first joinrec normal form, defined as “every recursive function where all recursive calls are tail-recursive is a recursive join point”.

This ticket proposes to transform this even further and float out all case alternatives that do not mention `jgo` as non-recursive join-points, as so:

```  let x = f x0
in let go n =
join jexit = h x
joinrec jgo 10 = call jexit
jgo i = call jgo (i+1)
in jump jgo n
in go (0::Int) + 100
```

I’d like to call this second `joinrec` normal form, defined as “in first `joinrec` normal form, and all subexpressions of a recursive join point `j` that are in tail-call position and do not mention `j` are join calls”.

If the floated expression has free variables that are bound inside the `joinrec`, they turn into parameters of the newly created joinpoint.

At this point, GHC can tell that `go` is called at most once, and will therefore happily inline `x` into the right hand side of `jexit.

## Alternative solutions

Ticket #10918 uses Call Arity results to learn that `x` is one-Shot, and inline it even in the original code. This works, but the problem is that float-out will undo this. See 10918#comment:5.

## Limitation

It only works for recursive functions that are join points, or can be turned into join points by loopification (#14068). It does not work forexample for

```let x = f x0
let go 0 = h x
go n = (go (n-1) + 1
in go 10
```

although it would be equally desirable to float `h x` out of `go` so that `x` can be inlined.

## Preservation

A remaining tricky point is that we need to stop one of these carefully-constructed non-recursive join points being inlined into a recursive join point, even if it is invoked at just one place. That should not be hard. And in a final run of the simplifer (or in CorePrep) we could switch off that restriction and let it inline. (Ticket #14137 is about inlining more join points into recursive join points, so it is the antithesis to the present ticket.)

### comment:2 Changed 18 months ago by nomeata

I am wondering about the best place to perform this transformation. I could do it relatively easily as part of loopification (#14068), which I currently inserted between the occurrence analyser and the simplifier. But we also want to do it to `joinrec`s that are already `joinrec`s?

The float out phase might also be a suitable place, since it – well – floats out expressions.

Or maybe the simplifier itself (which has a notions of floating expressions, doesn’t it) can do it?

Or should it be a pass of its own?

Simon, what is your intuition here?

### comment:3 Changed 18 months ago by simonpj

Definitely not in the simplifier.

FloatOut is already jolly complicated and I want to make it do NO FLOATING for join points. (There is currently a mess.)

I'd suggest doing it as part of loopification.

And I'd suggest NOT doing loopification with every occurrence analysis; that feels far too often to me.

### comment:4 Changed 18 months ago by nomeata

And I'd suggest NOT doing loopification with every occurrence analysis; that feels far too often to me.

The check of whether loopification can be done is pretty cheap, and falls out of Occurence Analysis anyways: I just check whether the occurrence info indicates that this is a recursive function that admits loopification:

```    RecursiveTailCalled join_arity <- tailCallInfo occ
```

So don’t see a reason to check this every time. Once a binding is loopified, it is a recursive join point, and loopification does not look at it any more.

Note to myself:

## Common up identical exit paths

I expect much code to look like this:

```let x = f x0
joinrec go 0 = h x
go 10 = h x
go 30 = if something then h x     else call go (n-1)
go 40 = if something then h (x+1) else call go (n-1)
go n = call go (n-1)
in call go 10
```

where a recursive function checks a bunch of conditions and has the same exit code multiple times. Ih would be silly to create three joint points with `h x` as the right hand side. So when floating out these exit expressions, I plan to check if the same expression is already being floated out and use that join point, to get

```let x = f x0
join exit1 = h x
join exit2 = h (x+1)
joinrec go 0 = call exit1
go 10 = call exit1
go 30 = if something then call exit1 else call go (n-1)
go 40 = if something then call exit2 else call go (n-1)
go n = call go (n-1)
in call go 10
```

### comment:6 Changed 18 months ago by simonpj

same exit code multiple times

Or you could just rely on CSE to do this; it's set up to do precisely that. I think it works correctly for join points. (Can you check the latter point?)

### comment:7 Changed 18 months ago by simonpj

The check of whether loopification can be done is pretty cheap, and falls out of Occurence Analysis anyways

Hmm. I just want to avoid an extra traversal on every single iteration. I suppose you could just do it as part of occurrence analysis? The occurrence analyser doesn't generally change the structure of the code; but it does do so in one situation; it drops dead code to improve the occurrence info on other things.

### comment:8 Changed 18 months ago by nomeata

I implemented this transformation; pushed to `wip/T14152` for storage. Don’t review the code too harshly yet, in particular feeding a list of fresh uniques to `maybeLoopify` is a temporary crutch.

Observations so far:

• Right now I am running this only after loopification. But that is unsatisfactory:

If we want this, we certainly also want it for functions that are `joinrec` to start with… but where?

Note that we need access to fresh uniques; this rules out the occurrence analyzer which is currently pure.
• Currently, we get the desired transformation, but the next iteration of the simplifier gets rid of it again, in three different ways:
• `preInlineUnconditionally` inlines them because they are values, so `canInlineInLam` is true.
• `postInlineConditionally` inlines them because they are marked as “work free”.
• `completeCall` does inlining at the call-site, again because they are marked as “work free”.
It is not clear to me how to fix it. It is not false that they are work-free, and we certainly want to inline these join-points into non-recursive uses (should they for some reason appear).

I agree with Simon that floating these things out and keeping them out feels “wrong” in the sense that it fights against the usual working of the simplifier.

So maybe back to the drawing board and thinking harder how we can tell the simplifier that it is safe to inline something into the exit path inside the recursive function, and keep it there? Could we somehow syntactically mark exit paths using a new form of ticks?

### comment:9 Changed 18 months ago by simonpj

I agree with Simon that floating these things out and keeping them out feels “wrong” in the sense that it fights against the usual working of the simplifier.

Well, now I understand better, I do quite like this approach. I'm not ready to give it up yet!

• `preInlineUnconditionally`. Yes `caonInlineInLam` is true, but `int_cxt` (short for "interesting context") is probably false.
• `completeCall`: I think we may be saying "never inline a join point that appears on the RHS of a recursive join point". We can make that happen by giving its unfolding an `UnfoldingGuidance` of `UnfNever`. We can spot that it apperas on the RHS of a recursive join point because its occ-info will have `occ_in_lam` = True.
• `postInlineUnconditionally`. As the comments say, the main goal here is to solve this
```        --      let x = f y in
--      case v of
--         True  -> case x of ...
--         False -> case x of ...
```
Here we want to inline `x`, otherwise it allocates a thunk. But join points do not allocate thunks; perhaps we can simply not do `postInlineUnconditionally` on the `OneOcc` branch?

And in fact giving it an `UnfoldingGuidance` of `UnfNever` will also stop it inlining.

NB: right at the end (in `CorePrep` perhaps) we may want to inline them after all, just to reduce clutter and jumps.

### comment:10 Changed 18 months ago by nomeata

I do quite like this approach. I'm not ready to give it up yet!

I like this game of bad-cop/good-cop with switching roles. Certainly a productive way of investigating an issue :-)

• `preInlineUnconditionally`: Well, `int_cxt` certainly is `True` – otherwise `int_cxt && canInlineInLam rhs` would be `False` and inlining would not happen. Are you saying that it should be made `False`?
• `completeCall`: Let me try to make sure I understand the reasoning: Normally a join point will not have `occ_in_lam`, because if it would occur under a “normal” lambda, it wouldn’t be a tail-call. The exception are the lambdas on the RHS of a join points, as these are ignored when calculating `occ_in_lam`. But there is an exception to that: Inside a recursive join point we do set `occ_in_lam`.

So the occurrence analyzer should detect this? I’ll try.

NB: right at the end (in CorePrep perhaps) we may want to inline them after all, just to reduce clutter and jumps.

Right, I keep that in the back of my mind, but first I need it to actually _not_ inline :-)

### comment:11 Changed 18 months ago by simonpj

Are you saying that it should be made False?

Ah... no, for functions `int_cxt` is true if it is applied.

Perhaps for join-points `preInlineUnconditionally` should never be True if `in_lam` is true. That validates what we are trying here.

But there is an exception to that: Inside a recursive join point we do set occ_in_lam.

Yes: `occ_in_lam` is true for a join point precisely when that join point is the exit of a recursive loop! That's just what we want.

### comment:12 Changed 18 months ago by nomeata

Thanks for the guidance, that nailed it:

• The code `isJoinId bndr, isOneOcc (idOccInfo bndr), occ_in_lam (idOccInfo bndr)` reliably detects an exit join point now, it seems. (I guess this deserves its own function `isExitJoinPoint`)
• When I create the join point, I add the correct `OccInfo`, so that this is there even before OccurAnal runs the next time.
• In `simplLetUnfolding`, when I detect an exit join point, I do not creating an unfolding. This prevents unfolding in `completeCall` and `postInlineUnconditionally`.
• In `preInlineUnconditionally` I check for exit join points explicityly.

Maybe it would be cleaner to leave the unfolding alone and simply check for `isExitJoinPoint` in all three places… but no, the simplifier seems to throw away the `idOccInfo`. So better stick to not creating an unfolding.

### comment:13 Changed 18 months ago by simonpj

the simplifier seems to throw away the idOccInfo.

That's by design. The occ-info after simplification isn't necessarily valid. E.g.

```x = e
f = \y. ...x...
y = ...(f 1)...(f 2)...
```

Currently x occurs once, but after inlining `f` twice, `x` occurs twice. So the simplifier carefullly deletes it. (Except for robust info like "this lambda binder is dead".)

### comment:14 Changed 18 months ago by nomeata

Ok, so `wip/T14152` has an implementation that works (judging from looking at the Core output). It’s not polished, but good enough to play around to see what happens. It’s also inefficient (it should probably work on an fv-annotated syntax tree instead of recomputing the free variables repeatedly.

But it still applies “exitification” only together with “loopification”. This makes it hard to evaluate the merits of this ticket independently of loopification. Also, if we want “exitification”, then we want it for all joinrecs.

Is there a better place to do it? Should it simply be a pass on its own that we run maybe after the first simplifier (which introduces most joinrecs?)?

### comment:15 Changed 18 months ago by simonpj

You could make two separate passes. Or one that (controlled by a flag) does either or both of loopification and exitification.

Of course if it does both, it'd better do exitification to prior joinrecs and to the ones introduced by loopification

My guess is that we'll want to do both at once, and do so once or at most twice in the pipeline.

### comment:16 Changed 18 months ago by nomeata

Note that loopification is not really a pass: The information of whether we can do it falls out of the simplifier, and the change to the AST is a simple local change and O(1).

I will turn exitification into a pass of its own.

### comment:17 Changed 18 months ago by nomeata

Wow, the results are great!

```Nofib allocations
Benchmark name 			previous 	change 		now
nofib/allocs/fannkuch-redux 	 870976696 	- 99.99% 	   64728 	bytes
nofib/allocs/k-nucleotide 	1089567552 	- 91.74% 	89963760 	bytes

Nofib runtimes
Benchmark name 			previous 	change 		now
nofib/time/CSD 			0.536 		- 6.72% 	0.5 		seconds
nofib/time/FS 			0.413 		- 3.15% 	0.4 		seconds
nofib/time/cryptarithm1 	0.529 		- 4.91% 	0.503 		seconds
nofib/time/integer 		1.562 		+ 3.01% 	1.609 		seconds
nofib/time/last-piece 		0.425 		- 3.06% 	0.412 		seconds
```

Almost too good to be true. I wonder where the bug is… :-)

### comment:18 Changed 18 months ago by nomeata

Inlining the exit join points in the `"final"` simplifier iteration is not a clear win:

```Nofib runtimes
Benchmark name 			previous 	change 		now
nofib/time/VS		 	0.372	 	+ 16.13% 	0.432 	seconds
nofib/time/cryptarithm1 	0.503 		+ 4.17% 	0.524 	seconds
nofib/time/k-nucleotide 	5.558 		- 7.72% 	5.129 	seconds
```

my theory is that floating exit paths out of tight loops makes tight loops very small which is beneficial for *mumble mumble* weird hardware reasons.

(I know, someone needs to look at the code instead of speculating…)

### comment:19 Changed 18 months ago by simonpj

Wow that's great. I would like to understand what's going on in comment:18. Inlining should simply remove a jump -- and the Cmm passes may well, I believe, do that anyway. As you say, actual insight is needed here!

### comment:20 Changed 18 months ago by nomeata

As you say, actual insight is needed here!

Sigh. It seems that `VS` has identical core. So the difference must be in the libraries, or be completely spurious. But tracking down that requires two new freshly built checkouts and more time, and it might just be spurious…

### comment:21 Changed 18 months ago by nomeata

Some of these measurements seem to be unreliably, in particular `time/VS`.

Right now the whole branch, including inlining them back in the end, yields this:

```Nofib allocations
Benchmark name 			previous 	change 		now
nofib/allocs/fannkuch-redux 	870976696 	- 99.99% 	64700		bytes
nofib/allocs/k-nucleotide 	1089567552 	- 91.78% 	89603872 	bytes

Nofib runtimes
Benchmark name 			previous 	change 		now
nofib/time/CSD 			0.536 		- 8.02% 	0.493 	seconds
nofib/time/cryptarithm1 	0.529 		- 4.73% 	0.504 	seconds
nofib/time/digits-of-e1 	0.704 		- 3.27% 	0.681 	seconds
nofib/time/fannkuch-redux 	4.401 		+ 3.84% 	4.57 	seconds
nofib/time/integer 		1.562 		+ 3.59% 	1.618 	seconds
nofib/time/k-nucleotide 	5.426 		- 6.28% 	5.085 	seconds
```

The effect of just the inlining patch is

```Nofib runtimes
Benchmark name 			previous 	change 		now
nofib/time/VS 			0.442 		- 17.19% 	0.366 	seconds
nofib/time/digits-of-e1 	0.704 		- 3.27% 	0.681 	seconds
nofib/time/k-nucleotide 	5.585 		- 8.95% 	5.085 	seconds
```

That the inlining patch has not changed over the version in comment:18, it was just rebased onto a slightly improved exitification patch, yet the runtime of `VS` now improves by 17% rather than regress by 16%. I draw this conclusion: There is some sensitivity to layout in `SD` that seems to be tickled by my changes in a non-systematic way. I might have a closer look at the regressions (and gains), but not sure if ICFP is the right time to do that.

### comment:22 Changed 18 months ago by nomeata

Looking at some of the performance changes observed. The two benchmarks with allocation changes show differences in Core, as expected. (Thunks previously bound outside a `joinrec` are now inlined.)

About those runtime changes: All of these have unchanged Core! So the change must be in the libraries… Did not dig deeper yet.

### comment:23 Changed 18 months ago by nomeata

Digging deeper: For `CSD`, `-ticky`, even with all of base instrumted, does not not show anything. Same for `cryptarithm1`.

Judging from the module sizes, in base only `Data.Semigroup`, `GHC.Arr` and `GHC.Float` are affected (but maybe floating a let binding can happen with equally sized output binary.). A quick look at the core difference shows that some thunks are floated inside a `joinrec`, or even inlined completely, e.g. in `Arr`’s `(//)` and `accum`.

None of these look as if they’d affect a benchmark like `CSD`.

It must be the weather, or the constellation of the stars.

### comment:24 Changed 18 months ago by nomeata

Status: new → patch

Anyways, this is ready for review, either on https://phabricator.haskell.org/D3903 or in git, branch `wip/T14152`. The latter has the advantage that the basic patch can be inspected independent of the on-the-fly-CSE addition (which maybe be nice due to smaller Core, but may not be worth the slight complication of the code).

### comment:25 Changed 16 months ago by nomeata

Description: modified (diff)

### comment:26 Changed 16 months ago by Joachim Breitner <mail@…>

In 0e953da1/ghc:

```Implement a dedicated exitfication pass #14152

The idea is described in #14152, and can be summarized: Float the exit
path out of a joinrec, so that the simplifier can do more with it.
See the test case for a nice example.

The floating goes against what the simplifier usually does, hence we
need to be careful not inline them back.

The position of exitification in the pipeline was chosen after a small
amount of experimentation, but may need to be improved. For example,
exitification can allow rewrite rules to fire, but for that it would
have to happen before the `simpl_phases`.

Perf.haskell.org reports these nice performance wins:

Nofib allocations
fannkuch-redux    78446640  - 99.92%      64560
k-nucleotide     109466384  - 91.32%    9502040
simple            72424696  -  5.96%   68109560

Nofib instruction counts
fannkuch-redux  1744331636  -  3.86% 1676999519
k-nucleotide    2318221965  -  6.30% 2172067260
scs             1978470869  -  3.35% 1912263779
simple           669858104  -  3.38%  647206739
spectral-norm    186423292  -  5.37%  176411536

### comment:27 Changed 16 months ago by simonpj

Great! Is the final "inline them back in" pass there, or not?

### comment:28 Changed 16 months ago by nomeata

I left it out for now, because instruction counts increase. I tried to analyse it, and here is what I found:

I read through a lot of core, comparing `-dverbose-core2core` between the version without inlining exit join points at the end, and the version with. In `compress2`, I don’t spot a difference in Core. In `fem` (a big program), there are a fair number of exit join points, but I don’t see anything fishy going on…

What I do, how ever, see is exit join points that are called from two positions in the code, where inlining duplicates the code. (Common block elimination might help with that, but #14226 seems to get in the way.)

I also see calls to `stg_gc_noregs()` turn into calls to `stg_gc_unpt_r1(R1)`, but I am not sure what that means, or if that is a good thing or a bad thing.

I expect that the “useless” unconditional jump to a non-inlined exit join point, which was the motivation for the final inlining, will be eliminated on the Cmm stage without much ado.

So the performance measurement says “don't inline”, and looking at the Core, not inlining seems to be fine, and the Cmm also looks better. So in the light of that I think I’ll conclude that the final inlining is neither necessary nor useful.

Anyways, the final inline patch still lives in `wip/T14152` for anyone to play with.

### comment:29 follow-up:  30 Changed 16 months ago by simonpj

I left it out for now, because instruction counts increase

So, just to be sure: allocation counts did not change?

What I do, how ever, see is exit join points that are called from two positions in the code, where inlining duplicates the code.

What if you inline only join points that are called once? So that no code duplication is involved.

I'm having a hard time seeing why that should do anything bad. And it can do something good... if you have

```join j x = ...y...
in
...jump j v...
```

then `y` will get pushed onto the stack at the `join`, so that its RHS knows where to find it. If it's inlined that may not happen. So may be more than just eliminating a jump.

Still, I accept that it's not top priority.

### comment:30 in reply to:  29 Changed 16 months ago by nomeata

I left it out for now, because instruction counts increase

So, just to be sure: allocation counts did not change?

Nope. Byte-for-byte identical.

What I do, how ever, see is exit join points that are called from two positions in the code, where inlining duplicates the code.

What if you inline only join points that are called once? So that no code duplication is involved.

I doubt that the code duplication is the real culprit here. More likely things like different order of blocks leading to different decision later in the code generation, e.g. in the register allocator, leading to less or more register pressure and different registers being saved? Do you know the difference between `stg_gc_noregs()` and `stg_gc_unpt_r1(R1)`?

I'm having a hard time seeing why that should do anything bad. And it can do something good... if you have

```join j x = ...y...
in
...jump j v...
```

then `y` will get pushed onto the stack at the `join`, so that its RHS knows where to find it. If it's inlined that may not happen. So may be more than just eliminating a jump.

Well, according to the numbers it's not happening.

Remember that the `jump` to an exit point is almost always going to the right-hand-side of a case alternative. Which is, as far I as I can tell, already a jump target, so the `y` in your example would have to be pushed on the stack just the same, woudn’t it?

I understand that this is all a bit unsatisfying intellectually, but I don’t want to sink more time into this, at least not without concrete examples or other evidence that show that what we are doing now is indeed bad.

### comment:31 Changed 16 months ago by simonpj

I understand that this is all a bit unsatisfying intellectually, but I don’t want to sink more time into this, at least not without concrete examples or other evidence that show that what we are doing now is indeed bad.

Yes that's fine. If we have 100 Joachim cycles I'd rather they were spent on the fattest target, and this doesn't look like one.

Back to loopification next?

### comment:32 Changed 16 months ago by nomeata

Back to loopification next?

Aye!

### comment:33 Changed 11 months ago by simonpj

Joachim, I've just been looking at `Exitify.hs` in pursuit of #15005. It's not easy going.

Could you add some careful Notes? For example, here are the things I stumbled over immediately:

• In the top-level `exitifyProgram` I think we make no attempt to exitify top level recursive functions. That's fine, but say so.
• Then in `go` of `exitifyProgram` I think we are looking at the RHS of a top-level function, and applying `goBind` to every binding within it. (NB: `goBind` calls `go` to deal with the RHS of the bindings; I'd be inlined do that in `go` so that `goBind` gets the post-exitify bindings.)
• Mysteriously we do not exitify the RHS of a case alternative (`goAlts`). This exception seems highly mysterious, possibly a bug?
• Then for recursive join-point bindings, we apply `exitify` to them (after having recursively processed the RHSs).
• There is some magic in `go` in `exitify` to pick out the ones we want to exitify. Explaining these conditions in a note would be good

### comment:34 Changed 11 months ago by simonpj

```isExitJoinId :: Var -> Bool
isExitJoinId id = isJoinId id && isOneOcc (idOccInfo id) && occ_in_lam (idOccInfo id)
```

Isn't the `isOneOcc` redundant? All join-point should have `OneOcc`. Maybe it'd be clearer to say

```isExitJoinId id
| isJoinId id
= case idOccInfo id of
IAmALoopBreaker {} -> False  -- A joinrec is never an exit join-ids
OneOcc { occ_in_lam = in_lam } -> in_lam
ManyOccs {} -> pprPanic "isExitJoinId" id
| otherwise
= False
```

Also I'd put this function in `SimplUtils` or something. Occurrence info is only valid on `InIds`, so it's crucial that `isExitJoinId` is only called on freshly occ-analysed code. It's not a generic function you can call anywhere.

### comment:35 Changed 11 months ago by simonpj

What's this (in `Exitify`):

```  where
exit_id_tmpl = mkSysLocal (fsLit "exit") initExitJoinUnique ty
`asJoinId` join_arity
`setIdOccInfo` exit_occ_info

-- See Note [Do not inline exit join points]
exit_occ_info =
OneOcc { occ_in_lam = True
, occ_one_br = True
, occ_int_cxt = False
, occ_tail = AlwaysTailCalled join_arity }
```

The occurrence analyser sets occ-info. No one else should. What happens if you leave this out?

### comment:36 Changed 11 months ago by simonpj

Also in `Exitify`:

```        -- We cannot abstract over join points
captures_join_points = any isJoinId args
```

Isn't this vacuously false? Join points are never arguments -- or they would not be join points!

Ah, no... `args` is (confusingly) not the argument of the call (although you use `args` for this purpose in the `collectArgs` call...aargh. Can we rename `args` to `abs_vars`, the variables over which we are going to abstract?

Ah! You probably need to topologically sort those `abs_vars`; see the `sortQuantVars` call in `SetLevels.abstractVars`. A bug waiting to happen.

Returning to `captures_join_points` this can only be true of a call of form `j e1 .. en`, which you deal with in an earlier case... so the test would better be done there.

### comment:37 Changed 11 months ago by simonpj

Consider

```joinrec j x y z = case x of
True -> <big>
False -> ...j...
```

where `<big>` does not mention `j`. It looks to me that you don't even attempt to turn the whole of `<big>` into an exit point. Instead you recurse into it. But why don't you? The bigger the better!

```join exit y = <big> in
joinrec j x y z = case x of
True -> exit y
False -> ...j...
```

### comment:38 Changed 11 months ago by nomeata

Wow, thanks for the thorough code review. I am not sure how I go that `goAlts` thing wrong. That ought to cripple the analysis significantly! Maybe part of some late refactoring before the merge? Anyways, I will go through all your comments in detail and implement what I can, maybe this afternoon.

### comment:39 Changed 10 months ago by nomeata

In the top-level exitifyProgram I think we make no attempt to exitify top level recursive functions. That's fine, but say so.

There are no top-level join points. Note that exitification does not make new join points.

But I added a one-line comment to remind of this fact.

Then in go of exitifyProgram I think we are looking at the RHS of a top-level function, and applying goBind to every binding within it. (NB: goBind calls go to deal with the RHS of the bindings; I'd be inlined do that in go so that goBind gets the post-exitify bindings.)

Not sure I follow the NB. Isn’t it the common idiom to have a `goExpr` (here `go`), and a `goBind` that are mutually recursive?

Mysteriously we do not exitify the RHS of a case alternative (goAlts). This exception seems highly mysterious, possibly a bug?

Not mysterious at all: I exitify `joint points`, i.e. recursive binders. The RHS of a case alternative is just an expression – nothing to exitify there. And `goAlts` calls `go` just fine…

I am actually not sure what is confusing about `exitifProgram`. Its comment says

`Traverses the AST, simply to find all joinrecs and call 'exitify' on them.`

and it does precisely that.

There is some magic in go in exitify to pick out the ones we want to exitify. Explaining these conditions in a note would be good

Every of the three conditions has a comment, and the first two point to notes…

Maybe it'd be clearer to say … I'd put this function in `SimplUtils` or something.

done

The occurrence analyser sets occ-info. No one else should. What happens if you leave this out?

If I leave this out, then until the next run of the occ analyzer, `isExitJoinId` will not detect an exit join point as such. From your clarification about that function I conclude that it shall only be used by the simplifier, which runs the occ analyzer before, so it looks like I can leave this out. Will try.

Can we rename `args` to `abs_vars`, the variables over which we are going to abstract?

Of course, sorry for that.

You probably need to topologically sort those `abs_vars`

Thanks, done.

Returning to captures_join_points this can only be true of a call of form j e1 .. en

I don’t think that’s true. What if I exitify something like

```  case b of True -> jump j 1; False -> jump j 2
```

This is not of this form, and I need to check whether I can actually abstract over this?

It looks to me that you don't even attempt to turn the whole of <big> into an exit point. Instead you recurse into it. But why don't you?

Sure I do: The case

```        -- We have something to float out!
```

applies to all expressions. Only when we decide not to float `e` do we actually destruct it and recurse into it.

Last edited 10 months ago by nomeata (previous) (diff)

### comment:40 Changed 10 months ago by nomeata

All join-point should have `OneOcc`

Not true, it seems, as witnessed by `testsuite/tests/codeGen/should_compile/jmp_tbl.hs` (already with `-O`, so without exitification). Your proposed code for `isExitJoinPoint` panics, complaining about `\$j_s3v8`:

```./inplace/bin/ghc-stage2 -dcore-lint testsuite/**/jmp_tbl.hs -O -dverbose-core2core -ddump-simpl-iterations -fforce-recomp
…
join {
\$j_s3v8
:: (# GHC.Prim.State# GHC.Prim.RealWorld, (PipeState, b_a2YY) #)
[LclId[JoinId(0)]]
\$j_s3v8 = case lvl_s3te of wild_00 { } } in
join {
\$j_s3ve
:: (# GHC.Prim.State# GHC.Prim.RealWorld, (PipeState, b_a2YY) #)
[LclId[JoinId(0)]]
\$j_s3ve
= join {
\$j_s3vc
:: (# GHC.Prim.State# GHC.Prim.RealWorld, (PipeState, b_a2YY) #)
[LclId[JoinId(0)]]
\$j_s3vc
= join {
\$j_s3va
:: (# GHC.Prim.State# GHC.Prim.RealWorld, (PipeState, b_a2YY) #)
[LclId[JoinId(0)]]
\$j_s3va = case phase_X1s7 of { __DEFAULT -> jump \$j_s3v7 } } in
case phase_X1s7 of {
__DEFAULT -> jump \$j_s3va;
HCc -> jump \$j_s3v8
} } in
case phase_X1s7 of {
__DEFAULT -> jump \$j_s3vc;
Ccpp -> jump \$j_s3v8
} } in
case phase_X1s7 of {
__DEFAULT -> jump \$j_s3ve;
Cc -> jump \$j_s3v8
```
Last edited 10 months ago by nomeata (previous) (diff)

### comment:41 Changed 10 months ago by nomeata

Those changes that I were able to do so far are now in https://phabricator.haskell.org/D4576 I will push this as soon as Harbormaster has confirmed that it does not regress anything.

### comment:42 Changed 10 months ago by nomeata

Ah! You probably need to topologically sort those abs_vars; see the `sortQuantVars` call in SetLevels.abstractVars. A bug waiting to happen.

While sleeping over it, I was wondering: Did I really get that wrong? And I checked: No, I did not! The `abs_vars` are derived from `captured`, and that is always in dependency order by construction. So no need for `sortQuantVars`.

So I reverted the (unnecessary and hence misleading) use of `sortQuantVars` in 270e3e9bbaabad3d9a1348cea9e46a9ecf1e5ec2, but left a comment that should prevent me from forgetting this again.

### comment:43 Changed 10 months ago by simonpj

All join-point should have OneOcc

Aha. They should, but indeed they don't. I've created Trac #15091 to fix this

### comment:44 Changed 9 months ago by Simon Peyton Jones <simonpj@…>

In 61b245a0/ghc:

```Small refactoring in Exitify

This refactoring was provoked by our conversation on
Trac #14152.  No change in behaviour.```

### comment:45 Changed 4 months ago by simonpj

Resolution: → fixed patch → closed

Joachim and I think we are all done here... closing.

Note: See TracTickets for help on using tickets.