Opened 16 months ago

Last modified 8 weeks ago

#14375 new bug

Implement with# primop

Reported by: simonpj Owned by: bgamari
Priority: highest Milestone: 8.8.1
Component: Compiler Version: 8.2.1
Keywords: JoinPoints Cc: simonmar, dfeuer, osa1
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: #14346, #16098 Differential Rev(s): Phab:D4110
Wiki Page:

Description (last modified by bgamari)

In Trac #14346 we proposed the new with# primop

with# :: a -> (State# s -> (# State# s, b #)) -> State# s -> (# State# s, b #)

This ticket is to track progress.

See ​Phab:D4110.

Change History (26)

comment:1 Changed 16 months ago by simonpj

In Phab:D4110 Simon M says

The one downside of this is that we have to build a function closure to pass to with#, which entails more allocation than we were doing previously. But there's an alternative approach that would avoid this: expand with# during CoreToStg (or CorePrep perhaps) into the original case expression + touch#. There should be no extra allocation, no new primops needed, all it does is prevent the simplifier from eliminating the continuation.

That's a good point. We implement runST in this way too.

But that seems very ad hoc. I've realised that we have quite a bunch of primops that take continuations. For example

maskAsyncExceptions# :: (State# RealWorld -> (# State# RealWorld, a #))
                     -> (State# RealWorld -> (# State# RealWorld, a #))

We don't really want to allocate a continuation here, only to immediately enter it. But in fact we do!

I've also realised that it's quite easy to avoid:

  • When converting to STG, instead of insisting that the argument to maskAsyncExceptions# is a variable, insiste that it is a lambda (\s.e)
  • When doing code-gen for maskAsyncExceptions# (\s.e) s2, emit the mask code (as now) and continue code-gen for e.

That would mean altering STG a bit to allow non-variable arguments.

An alternative would be to convert maskAsyncExceptions# e s to this STG:

join j s = e
in maskAsyncExceptions# j s

This isn't quite as good because it flushes the live variables of e to the stack, and then takes a jump to it (the latter will be elminated in Cmm land); but it's much better than what we do now. NB: this would not be valid Core because j is not saturated; but here it's just an intermediate step in codegen.

Moreover, we don't want to make it too much like join points; in particular not in the simplifier. For example

  case (maskAsyncExceptions# (\s. e) s2) of
     (# s3, r #) -> blah

----> ????

  maskAsyncExceptions (\s. case e of (# s3, r #) -> blah) s2

Probably not! Because that would broaden the scope of the mask. But it's fine to treat it in a very join-point-like way at codegen time.

We can apply similar thinking to catch#.

catch# :: (State# RealWorld -> (# State# RealWorld, a #) )
       -> (b -> State# RealWorld -> (# State# RealWorld, a #) )
       -> State# RealWorld
       -> (# State# RealWorld, a #)

Here we allocate two continuations. But we'd really prefer to allocate none! Just push a catch frame (which we do anyway).

Perhaps we can generate this STG:

join jnormal s = e1 s
in join jexception b s = e2 b s
in catch# jnormal jexception s

Again we compile those join point just as we normally do (live variables on the stack), so that invoking one is just "adjust SP and jump". Again this would not be valid Core, just a codegen intermediate.

I like this.

Conclusion: let's not do any special codegen stuff for with# until we've worked this out.

comment:2 Changed 16 months ago by simonpj

Keywords: JoinPoints added

comment:3 Changed 16 months ago by bgamari

I'm not entirely sure I see what this means for the join point saturation invariant. Indeed we under no obligation to obey the same invariants surrounding join points in STG that we obey in Core. However, we are surely guaranteeing some set of invariants; what are they? Specifically, when is an unsaturated use of a join point binder allowed?

Last edited 16 months ago by bgamari (previous) (diff)

comment:4 Changed 16 months ago by simonpj

Like I say, I'm thinking of this as STG only; an intermediate on the way to codegen.

Yes there are still invariants. To be a bit more precise maskAsyncExceptions# j s, j must be an arity-1 join point. Always.

But I am only thinking aloud here.

comment:5 Changed 16 months ago by bgamari

To be a bit more precise maskAsyncExceptions# j s, j must be an arity-1 join point. Always.

Sure, so in effect we just lift the saturation requirement on arguments of some primops. That makes sense.

I just wanted to make sure that we wouldn't be sacrificing even more STG-linting with this change.

I can have a swing at implementing this once we feel the design has converged, if you would like.

Last edited 16 months ago by bgamari (previous) (diff)

comment:6 Changed 16 months ago by simonmar

But if we had

join j s = e
in maskAsyncExceptions# j s

How can we compile this? maskAsyncExceptions# entails pushing a stack frame, but jumping to the join point entails truncating the stack back to the join. We can't do both! Am I missing something?

I suggest that we treat with# in the same way as runST for the purposes of fixing the current bug. I think it'll be simpler to do it this way than what we have in Phab:D4110, because we don't need any support in the RTS at all.

I agree it would be nicer to find a unified way of handing all these primops that take IO continuations, and especially it'd be great to avoid the allocation for catch#.

comment:7 Changed 16 months ago by simonpj

I think I was going too fast. Let's try this instead:

  • maskAsyncExceptions# is allowed to have \s.e as its argument. Perhaps even required.
  • The codegen looks like
    cgExpr (maskAsyncExceptions (\s.e) s2)
      = do { emit (mask frame)
           ; bind s:=s2 (cgExpr e) }
    

Essentially just push the stack frame and carry on with e.

For the "exception handler" argument for catch#, a join might make more sense. But the "main event" argument for catch# should work as above.

Does that make more sense?

comment:8 Changed 16 months ago by simonmar

Cc: simonmar added

Yes, I think that would work. The mechanisms that we have in the code generator for pushing update frames should also work for pushing mask/unmask frames and catch frames.

comment:9 Changed 16 months ago by bgamari

Blocking: 14346 added
Owner: set to bgamari

comment:10 Changed 15 months ago by bgamari

Blocking: 14346 removed

comment:11 Changed 13 months ago by alexbiehl

Differential Rev(s): ​Phab:D4110​Phab:D4110,Phab:D4189

comment:12 Changed 7 months ago by osa1

Differential Rev(s): ​Phab:D4110,Phab:D4189​Phab:D4110, Phab:D4189

comment:13 Changed 7 months ago by simonpj

Priority: normalhighest

This ticket sounds urgent to me:

  • It caused #14346
  • And maybe #14329
  • It caused #15260 (which absorbed lots of people's time)

I'm not following the details, but it seems that alexbiehl has made a patch, and we should review and commit it. And check that it fixes the crashes above.

Last edited 7 months ago by simonpj (previous) (diff)

comment:14 Changed 7 months ago by Ben Gamari <ben@…>

In 56590db/ghc:

base: Make Foreign.Marshal.Alloc.allocBytes[Aligned] NOINLINE

As noted in #14346, touch# may be optimized away when the simplifier can see
that the continuation passed to allocaBytes will not return. Marking CPS-style
functions with NOINLINE ensures that the simplier can't draw any unsound
conclusions.

Ultimately the right solution here will be to do away with touch# and instead
introduce a scoped primitive as is suggested in #14375.

Note: This was present in 8.2 but was never merged to 8.4 in hopes that
we would have #14375 implemented in time. This meant that the issue
regressed again in 8.4. Thankfully we caught it in time to fix it for
8.6.

(cherry picked from commit 404bf05ed3193e918875cd2f6c95ae0da5989be2)

comment:15 Changed 7 months ago by bgamari

Merged to ghc-8.6 and master. Hopefully we will be able to revert this when with# is merged in (likely) 8.8.

comment:16 Changed 5 months ago by dfeuer

Cc: dfeuer added
Milestone: 8.8.1

comment:17 Changed 5 months ago by dfeuer

I don't understand why this has to wait for the continuation arguments machinery. Yes, that will make it more efficient, but shouldn't we get the correctness now and worry about efficiency later? If we write

with# a m s =
  case m s of
    (# s', r #) -> (# touch# a s', r #)
{-# NOINLINE with# #-}

won't that at least let users write reliable backwards-compatible code?

comment:18 Changed 5 months ago by bgamari

Yes, we could and probably should introduce such a thing.

comment:19 Changed 4 months ago by osa1

Cc: osa1 added

comment:20 Changed 3 months ago by bgamari

Description: modified (diff)

comment:21 Changed 3 months ago by bgamari

Description: modified (diff)

comment:22 Changed 3 months ago by tdammers

Updated Phab:D4110 with the original patch rebased onto master.

Also added two tests.

The first one, T14375, is a slightly modified version of the small test case from comment:14:ticket:14346; the original reproduction case is unsuitable for testsuite use due to the crucially important use of forever, but by throwing an exception, we can stop execution while still tricking the optimizer into considering the code after forever unreachable. I have verified that this test fails on GHC 8.2 and passes after this patch. However, due to other changes in the meantime, it also passes on GHC versions just before the patch, even when we disable the obvious suspect (NOINLINE pragmas on the alloca... functions, the previous workaround), so there must be yet something else going on.

The second test case, T14375-2, was suggested by bgamari in a private chat. The idea is to verify the correct operation of the new with# primop by using it on a binding that is also the key of a weak pointer, using its finalizer to attach observable behavior to its deallocation. This way, we can tell from the test output whether the finalizer runs before or after the end of the block wrapped in with#. Unfortunately, finalizers are somewhat unpredictable creatures, and so the test case is somewhat brittle - in order to actually see the finalizer running, the test has to be compiled unoptimized, and the threadDelay calls in strategic locations are needed to trigger GC. Obviously, since we're using the new with# primop directly in this second test, it is impossible to verify that it fixes anything; it just tells us that with# behaves as expected in this case.

comment:23 Changed 3 months ago by tdammers

As briefly touched upon in the HQ meeting: the T14375-2 test contains a function with :: a -> IO () -> IO () that's just a lightweight wrapper for with#; do we want to have this with function somewhere in the base libraries, similar to how other primops (e.g. mkWeak#) are exposed through more palatable wrappers (like mkWeak)?

comment:24 Changed 8 weeks ago by osa1

Differential Rev(s): ​Phab:D4110, Phab:D4189​Phab:D4110

This ticket is really confusing so I talked to Ben on this. Here's the current status:

  • #14346 is fixed but the touch# primop is essentially still broken. The suggestion of ticket:14346#comment:18 suggested implementing some hacks in simplifier to avoid removing unreachable continuation if the continuation calls touch#, but that's not implemented and as far as I can see there are no plans on implementing it (with no explicit reason to not to). Instead we want to use with# whenever possible.

To fix the problem with #14346 we introduced some NOINLINEs to functions that use touch#, so simplifier now can't see that the touch# is unreacable and remove it.

  • This ticket has two ideas:
    1. A new primop with#. This is being implemented in Phab:D4110.
    2. A more efficient implementation plan for primops that take continuations. This is being implemented in Phab:D4647, although it seems to be dormant now (last update from the author is in Jun).

To keep things more manageble (and avoid the communication problems we had with e.g. #15696) let's track the progress for (2) in another ticket and only worry about (1) here. If we really want (2) before (1) then we can consider the ticket for (1) as a blocker, and move on to this ticket after (2).

(I'm removing Phab:D4189 from the diffs list)

comment:25 Changed 8 weeks ago by osa1

I filed #16098 for (2).

comment:26 Changed 8 weeks ago by osa1

I just saw the last comment in Phab:D4110 which implements the new with# primop without (2) in comment:24. Do we want to focus on #16098 first? Pinging simonpj, simonmar, bgamari.

Last edited 8 weeks ago by osa1 (previous) (diff)
Note: See TracTickets for help on using tickets.