Opened 6 months ago

Last modified 6 days ago

#12620 new feature request

Allow the user to prevent floating and CSE

Reported by: nomeata Owned by:
Priority: normal Milestone:
Component: Compiler Version: 8.0.1
Keywords: CSE Cc: kosmikus, edsko, MikolajKonarski, michalt
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: #9520, #8457 Differential Rev(s):
Wiki Page:

Description (last modified by bgamari)

This is a write-up of a rough idea that Andres Löh and me had at ICFP 2016 in order to address some Real World problems Andres noticed and that are currently hard to avoid.

The goal is to give the user more control about expressions that the compiler would like to float out (or CSE), but the programmer knows better. Example (assume no list fusion exists):

enum xs = zip [1..] xs

This leads to a horrible space leak, as GHC will float out [1..] to the top.

Our idea is to have a magic function nofloat :: a -> a (magic in the same sense as inline and lazy) that the programmer would use here:

enum xs = zip (nofloat [1..]) xs

With these effects:

  • Sub expressions are not floated out of a nofloat.
  • An expression of the form nofloat e would not be floated beyond the innermost enclosing lambda.
  • Two expressions of the form nofloat e would not be commoned up by CSE.

This way, unwanted sharing is prevented.

In contrast to a hypothetical veryCheap function, it does not mean that the compiler should float it into lambda (no unwanted duplication either).

Two open questions (among many others, I am sure:)

  • Likely, rule matching should look through nofloat. At least in this example (and similar ones like map (nofloat [1..]), the rules in question will avoid the spaceleaks).
  • Possibly, nothing should be floated (inlined) into a nofloat. Rationale: Assume the library is changed so that
    [n..] = nofloat (realEnumFrom n)
    {-# INLINE [n..] #-}
    
    Then zip [fib 1000..] would be rewritten by the inliner to zip (let x = fib 1000 in (nofloat [x..])). Moving the fib 1000 into the nofloat would change the behaviour in a possibly surprising way.

Attachments (1)

awaithack.hs (1.4 KB) - added by michaelt 6 months ago.
use oneShot and a rule for await to fight 'full laziness'

Download all attachments as: .zip

Change History (33)

comment:1 Changed 6 months ago by edsko

Cc: edsko added

comment:2 Changed 6 months ago by MikolajKonarski

Cc: MikolajKonarski added

comment:3 Changed 6 months ago by bgamari

Cc: kosmikus added; kopernikus removed
Description: modified (diff)

comment:4 Changed 6 months ago by simonpj

That all looks possible. Since nofloat does several things, it may not be long before people start asking for variants that do some combination of its properties. But I guess we can jump that bridge if we come to it.

It would be useful to give some compelling use-cases.

comment:5 Changed 6 months ago by duncan

Can I suggest a closely related idea, and also related to #9520

data Pipe i o r = Yield o {-# NOUPDATE #-} (Pipe i o r)

This says we'll never do thunk updates on that field in that constructor. So similar idea (I believe) to oneShot lambdas.

Indeed we might need both no update on fields and oneShot, I'm not sure, e.g.:

data Pipe i o r = Yield o {-# NOUPDATE #-} (Pipe i o r)
                | Await   {-# NOUPDATE #-} (Either r i -> Pipe i o r)

-- smart constructor:
await f = Await (GHC.Magic.oneShot f)

What's all this for? For avoiding treating these control structures as data structures (which is what #9520 is all about).

comment:6 Changed 6 months ago by edsko

Right, so a lot of the thinking that led to this ticket came from trying to understand memory leaks in conduit code. See my recent blog post http://www.well-typed.com/blog/2016/09/sharing-conduit/ where these issues are described in great detail; this should also serve, I hope, as one "compelling use case".

That said, I like the idea of a "noupdate" much better than a "nofloat". It would seem to me that its semantics would be easier to specify; and if it means I don't have to think so hard about what exactly the optimizer is doing to my code in order to understand why I do or do not have a memory leak, that would very welcome.

I really like @duncan 's suggestion of having a type annotation on a type; though we might also want some adhoc way of saying "make this thunk not-updateable". An easyish experiment perhaps might be to declare a magic datatype

data DontUpdate a = DontUpdate a

with the property that any code that looks at the thunk in the payload of DontUpdate doesn't cause that thunk to be updated. Then in @duncan 's example we could define

data Pipe i o r = Yield o (DontUpdate (Pipe i o r))

That said, I'm not sure exactly what DontUpdate should do for the lambda; but this is a question about @duncan's proposal too. I think what we want to happen is that the thunks in the function closure never get updated (this, in a nutshell, is what is causing memory leaks in conduit code; see the blog post); but that's already more magical than just saying "don't update this thunk".

comment:7 Changed 6 months ago by simonpj

I think that "noupdate" would require some careful thought. What if I say

f x = if ... then Yield blah x
             else ...

Then the "noupdate" second field of Yield is just the parameter to f. Does the caller have to know not to build an updatable thunk. And why is updating so bad?

(Confession: I have not yet read Edsko's post. But I it should be possible to give a crisp explanation of what any language feature does in a standalone way.)

comment:8 Changed 6 months ago by duncan

Right, this is an initial idea and hasn't been fleshed out. Thanks for the probing example :-)

So the intention is that it's a purely local thing. So in that example, the answer is no, we do not expect a caller far away to have to know anything. The idea is that evaluating "via" the noupdate field should not perform thunk updates, but I appreciate that may not match how thunk construction and update works.

So how about something like this...

Suppose the primitive is not on fields, but on let. This is by analogy with strict let !_ = versus strict constructor fields. The primitive with strictness is at use sites and a convenience for systematic use we can push it to constructor fields, which is defined in terms of constructor wrappers.

So suppose the primitive is let {-# NOUPDATE #-} x = ..., and so then the Yield constructor above could perhaps be defined with a wrapper like

data Pipe i o r = Yield o {-# NOUPDATE #-} (Pipe i o r)

yield o x = let {-# NOUPDATE #-} x' = x in Yield o x'

So in your f x example above then this would do very little (and indeed we'd want it to do precisely nothing different to the usual, by shorting out the extra let indirection). But if things are defined with Yield (expr) or locally ghc decides to float/push things in, then the expression would end up in the let {-# NOUPDATE #-} x' = ... and so there would be an effect.

comment:9 Changed 6 months ago by tomjaguarpaw

I'm very glad to see full laziness getting some attention. I've been aware of its deleterious effects for some time and have tried to spread awareness of it:

I have even asked whether it is an optimization worth performing at all, though I conclude that it is:

The full laziness transformation causes a lot of headaches and something really needs to be done about it. However I do not think this suggestion is the right approach. Why not tweak the transformation so that it only fires in cases that are guaranteed not to lead to memory leaks? That could be as simple as only hoisting bindings of monomorphic non-recursive datatypes. The proposed nofloat keyword is just adding additional complexity over a transformation which itself is introducing too much complexity. I'm very concerned about the idea.

comment:10 in reply to:  7 Changed 6 months ago by nomeata

Replying to simonpj:

I think that "noupdate" would require some careful thought. What if I say

f x = if ... then Yield blah x
             else ...

Then the "noupdate" second field of Yield is just the parameter to f. Does the caller have to know not to build an updatable thunk.

I guess we would instruct the demand analysis to believe that Yield has strictness signature <L,U><L,1*U> and thus this once-used information will be propagated, at least to the extent possible.

comment:11 in reply to:  9 Changed 6 months ago by edsko

Replying to tomjaguarpaw:

I'm very glad to see full laziness getting some attention (...) I have even asked whether it is an optimization worth performing at all, though I conclude that it is:

Yup, I cite this in the blog post :)

However I do not think this suggestion is the right approach. (...) The proposed nofloat keyword is just adding additional complexity over a transformation which itself is introducing too much complexity. I'm very concerned about the idea.

I agree that it would be preferable not to "program the optimizer" when writing Haskell code. That's another reason in fact why I prefer noupdate over nofloat, beacuse actually noupdate goes beyond full laziness. Consider this example from the blog post:

retry :: IO a -> IO a
retry io = catch io (\(_ :: SomeException) -> retry io)

main :: IO ()
main = retry $ ni_mapM_ print [1..1000000]

This program has a memory leak, but it's nothing to do with full laziness here. Now admittedly we could turn this into a full laziness issue by giving the argument to retry a dummy unit argument or something like that, so that we write

retry :: (() -> IO a) -> IO a
retry io = catch (io ()) (\(_ :: SomeException) -> retry io)

main :: IO ()
main = retry $ \() -> ni_mapM_ print [1..1000000]

or something like that, but then you would have to do that in every single function that duplicates IO actions (think forever, replicateM_, etc.) Instead, we could mark that list as noupdate and the memory leak would be gone.

comment:12 Changed 6 months ago by tomjaguarpaw

Edsko, it seems to me that the problem that you mention here is quite easy to avoid.

main :: IO ()
main = retry $ return () >>= \_ -> ni_mapM_ print [1..1000000]

is sufficient, unless I am very much mistaken. With such a construction the list is allocated afresh for each invocation of the IO action.

comment:13 Changed 6 months ago by edsko

Fair enough, that's an easier workaround. But the idea is to have something a little more compositional. For example, in the case of conduits, we probably never want to share a conduit value. So it would be great if we could annotate the conduit constructors with a noupdate annotation, and then users of the conduit library don't have to worry about this problem anymore. After all, in the list example, it's not obvious that

main :: IO ()
main = retry $ runConduit someConduit

has a space leak; even less so when that retry and the runConduit are in different places:

go :: IO ()
go = runConduit someConduit

main :: IO ()
main = retry go 

We'd need to have the foresight to write

main :: IO ()
main = retry $ return () >>= \_ -> go 

The situation really is very close to strictness; do we want to make sure every single function using a datatype has the right seqs in the right place, or we just put some strictness annotations on the datatype?

comment:14 Changed 6 months ago by WrenThornton

It looks like that fix only works for the default -O0. Passing either -O1 or -O2 reintroduces retry's space leak

Last edited 6 months ago by WrenThornton (previous) (diff)

comment:15 Changed 6 months ago by tomjaguarpaw

Edsko, I'm a bit puzzled. For the case of conduits, isn't it enough to hide things behind lambdas in the definition of the Pipe type?

Wren, sure, but Edsko's original claim is that this isn't a full laziness issue. My example brings it back to being a full laziness issue indeed. My contention is that even given Edsko's example it still makes more sense to fix the full laziness transformation than add a magic word.

comment:16 Changed 6 months ago by tomjaguarpaw

Edsko, I'm a bit puzzled. For the case of conduits, isn't it enough to hide things behind lambdas in the definition of the Pipe type?

That is, "enough modulo full laziness".

Last edited 6 months ago by tomjaguarpaw (previous) (diff)

comment:17 in reply to:  15 Changed 6 months ago by edsko

Replying to tomjaguarpaw:

Edsko, I'm a bit puzzled. For the case of conduits, isn't it enough to hide things behind lambdas in the definition of the Pipe type?

Hmmm, yes. I think it's true that if full laziness is disabled everywhere and for everyone (to be precise, in every module defining conduits), then it probably suffices. But I'm not sure quite how realistic that is.

comment:18 Changed 6 months ago by tomjaguarpaw

This is why my suggestion is exactly to tweak the conditions when full laziness fires!

comment:19 Changed 6 months ago by simonpj

I'm beginning to get glimmers of understanding about this no-update thing. Consider

  t1 = [1..n]
vs
  t2 = \_ -> [1..n]
vs
  t3 = let x = [1..n] in \_ -> x

Note that

  • If we use t1 in a shared context like sum t1 / length t1, we'll end up materialising the whole list.
  • For t2, we'd get sum (t2 ()) / length (t2 ()), and now the list is computed twice rather than duplicated. Note that t1 and t2 have different types of course.
  • Then t3 is the result of applying the full laziness transformation to t2, and its space behaviour is back to that of t1.

Reflections:

  • I think that this "noupdate" pragma is intended to achieve an effect like t2, but more conveniently, without changing types. Correct?
  • I think (but am not sure) that you intend to use this only for one-shot thunks, where (unlike the sum/count example) the thunk is evaluated only once. In which case it would often be discarded after being evaluated, in which case where does the leak come from. A small, concrete example would be jolly useful.
  • Notice how important it is that in t2 the lambda syntactically encloses the leaky computation. Otherwise you get t3. My conclusion from this is that if you want a pragma on a data constructor, that the pragma only guarantees to affect the syntactic argument. Thus
      let x = <expression> in Yield o x
    vs
      Yield o <expression>
    

The latter would "work" (i.e. <expression> would be wrapped in a non-updatable thunk); but the former might not.

I say "might not" rather "would not" because cardinality analysis might propagate the one-shot info to x. But that would be a "best-efforts" thing on which one might not like to rely.

Would syntactic enclosure be enough in your application?

comment:20 Changed 6 months ago by jpbernardy

As it turns out, we currently have a proposal on the table which is capable of expressing where sharing should not occur, in a principled way, by using types.

This page sums up how it may play out in Edsko's example.

https://ghc.haskell.org/trac/ghc/wiki/LinearTypes/Examples#Controllingsharingfulllaziness

comment:21 Changed 6 months ago by simonpj

Good point Jean-Phillipe. I had not quite made the connection before, thank you.

The linear-types discussion is at a fairly early stage, but it does suggest that we should not go far with this "noupdate" stuff just yet.

comment:22 Changed 6 months ago by edsko

The more I think about this, the less convinced I am that a nofloat or even a local noupdate annotation really helps. The problem is: where do we put the annotation? The whole point of having such an annotation, as opposed to just disabling full laziness in the whole module, is to have more fine grained control over where full laziness applies and where it doesn't. This was easy in the example that this ticket started with

enum xs = zip (nofloat [1..]) xs

but it's far less obvious in larger examples. For example, consider the definition of a conduit that implements the HTTP protocol (Michael Snoyman's http-conduit package), or a conduit that does constant space type inference for large JSON documents (an example from the code base I am working on). Now how do we know what in these definitions to mark as nofloat? If we get it wrong, then full laziness might float something else out that we weren't expecting, and we might once again end up with a difficult to debug space leak. The only really workable solution would be to mark the whole body as nofloat, but now we've lost the advantage of fine granularity. I'm guessing @tomjaguarpaw will say at this point "see! the problem is full laziness itself" and to be honest, I'm starting to get more and more convinced by that point of view.

However, I still believe that there is an alternative by means of the NOUPDATE annotation. But, having thought about it more, I don't think annotating the constructors is the right approach. @simonpj asks for a minimal example, so let's consider this one:

module Main (main) where

import System.IO.Error

data Sink = Await (Maybe Char -> Sink) | Done Int

countFrom :: Int -> Sink
countFrom n = Await $ \mi -> case mi of
                Nothing -> Done n
                Just _  -> countFrom $! n + 1

feedFrom :: Int -> Sink -> IO ()
feedFrom _ (Done n)  = print n
feedFrom 0 (Await f) = feedFrom 0       (f $ Nothing)
feedFrom n (Await f) = feedFrom (n - 1) (f $ Just 'A')

retry :: IO a -> IO a
retry io = catchIOError io (\_ -> retry io)

main :: IO ()
main = retry $ feedFrom 1000000 (countFrom 0)

A Sink (a special kind of "conduit") is some kind of automaton that accepts (Awaits) a bunch of inputs (in this case Chars) and at some point terminates (Done). Let's recap from the blog post why this has a space leak:

  1. feedFrom 1000000 (countFrom 0) is a PAP (waiting for its State# RealWorld argument)
  2. When retry executes the action, it maintains a reference to that PAP from the exception handler.
  3. In the environment of the PAP is a thunk corresponding to countFrom 0.
  4. Finally, and crucially, full laziness is turning the definition of countFrom to something more akin to
-- Full laziness turns countFrom into:
countFrom :: Int -> Sink
countFrom n = let k = countFrom $! n + 1
              in Await $ \mi -> case mi of
                   Nothing -> Done n
                   Just _  -> k

(The example with the original definition of countFrom has a space leak when compiled with -O but no space leak with -O -fno-full-laziness; if we use this version of countFrom, we have a space leak with or without full laziness enabled.)

(1)-(4) together means that there is a reference from the PAP's environment to the countFrom 0 thunk, and as feedFrom evaluates that thunk we build up a long chain

Await ---payload---> FUN ---environment---> Await ---payload---> ...

where every Await constructor has a function as its payload, and that function has a reference to the next Await constructor in its environment (closure) (section "Full laziness versus sinks" of http://www.well-typed.com/blog/2016/09/sharing-conduit/ has some pictures.).

So what's the solution here? Perhaps one might argue that full laziness is the culprit here; it should not have floated out that continuation in countFrom. Like I said, I'm starting to have a lot of sympathy for that point of view; I will soon need to publish an erratum to my blog post because I was once again underestimating full laziness.

BUT. We can ask a different question: do we really want to be thinking so hard about when and where things get allocated precisely? What if the user themselves wrote that alternative version of countFrom -- after all, it seems like an innocuous change. Should we really have to think so low-level when writing Haskell code? I would like to be able to answer "no" to that question.

Here's the thing: conduits (and other structures like it) are data structures designed to drive computation; we never expect them to be shared and built up in memory. (When we were discussing these matters at Well-Typed a comparison was drawn to data versus codata.) I think it would be great if we could express this, and NOUPDATE, I think, might allow us to do that.

However, I now think annotating the constructors is not the right approach. In addition to Simon's probing questions, above, let's consider the example countFrom. What is the thunk that we don't want to be updateable? Well, countFrom 0 really; and, if pressed for another one, the Sink in the environment of the continuation in countFrom (countFrom $! n + 1). Neither of those is the argument to a constructor.. I think that instead we should annotate the type:

{-# NOUPDATE Sink #-}
data Sink = Await (Maybe Char -> Sink) | Done Int

Now questions such as "who created this thing? do we need spooky action at a distance?" are no longer relevant. It's simple and type directed. Any thunk of type Sink never gets updated.

Some other minor bits and bobs:

Replying to simonpj:

I'm beginning to get glimmers of understanding about this no-update thing. Consider

  t1 = [1..n]
vs
  t2 = \_ -> [1..n]
vs
  t3 = let x = [1..n] in \_ -> x

Note that

  • If we use t1 in a shared context like sum t1 / length t1, we'll end up materialising the whole list.
  • For t2, we'd get sum (t2 ()) / length (t2 ()), and now the list is computed twice rather than duplicated. Note that t1 and t2 have different types of course.
  • Then t3 is the result of applying the full laziness transformation to t2, and its space behaviour is back to that of t1.

Reflections:

  • I think that this "noupdate" pragma is intended to achieve an effect like t2, but more conveniently, without changing types. Correct?

Exactly. If we had a list type that was marked as NOUPDATE, then sum t1 / length t1 would not have a space leak (though the list would be evaluated twice).

  • I think (but am not sure) that you intend to use this only for one-shot thunks, where (unlike the sum/count example) the thunk is evaluated only once. In which case it would often be discarded after being evaluated, in which case where does the leak come from. A small, concrete example would be jolly useful.

No, I don't think that's necessarily the case. Marking something as NOUPDATE would imply that you're okay with it being evaluated more than once; indeed, that's what you want. In the minimal example I've been discussing in this comment, we want that conduit (sink) to be re-evaluated should the exception handler be run.

  • Notice how important it is that in t2 the lambda syntactically encloses the leaky computation. Otherwise you get t3.

I think this is another reason to move to a type directed approach instead. Syntactic enclosure is too brittle and too prone to be affected by the optimizer.

Replying to jpbernardy:

As it turns out, we currently have a proposal on the table which is capable of expressing where sharing should not occur, in a principled way, by using types.

This page sums up how it may play out in Edsko's example.

https://ghc.haskell.org/trac/ghc/wiki/LinearTypes/Examples#Controllingsharingfulllaziness

Hmmm, I had not realized the connection to linearity, and I wasn't aware of this work. Must take a closer look (my PhD is on uniqueness typing :). I'm not sure however that linearity is what we want here. Do we want to reject a definition such as

someConduit = do
    x <- await
    case x of
      True  -> do foo ; someConduit 
      False -> do bar ; someConduit 

If conduits can never be shared, this would be type incorrect. This would seem too restrictive. NOUPDATE in a way is kind of opposite to linearity: it's fine to share, just make sure that every time we access this value we recompute it.

Last edited 6 months ago by edsko (previous) (diff)

comment:23 Changed 6 months ago by jpbernardy

edsko: the above definition would be rejected only if the data inside the conduit would be linear (x), instead of the conduit itself. It is fine to make the conduit type linear and its contents shared.

Incidentally, I have written a stream library based on this idea, and it's described here: https://jyp.github.io/pdf/Organ.pdf The paper goes into the implication that linearity has in this case, in quite depth.

comment:24 Changed 6 months ago by edsko

Related work: Joachim Breitner's (unpublished) paper "dup – Explicit un-sharing in Haskell" (https://arxiv.org/pdf/1207.2017v1.pdf).

comment:25 Changed 6 months ago by simonpj

Any thunk of type Sink never gets updated.

That's extremely dodgy isn't it? What about

let s1 :: Sink = ...
    s2 :: Sink = ...
    x  :: Sink = if <expensive> then s1 else s2

If x is not updated, but is evaluated more than once, we'll evaluate <expensive> more than once.

Perhaps you mean something more like this:

data Sink = Await (Maybe Char -o Sink) | Done Int

Notice the "-o", meaning a "one-shot function". The idea is that one-shot functions are called at most once. (Maybe exactly once, but I think at-most once is better.)

So in your countFrom example, the continuation k would not be floated outside the lambda; and if it was written outside it'd be floated inside the lambda.

GHC already has the notion of a one-shot lambda; it's just not dignified as part of the type system.

Would that serve? I think that you do intend that the argujment of Await is called at most once, don't you?

comment:26 in reply to:  25 Changed 6 months ago by edsko

Replying to simonpj:

Any thunk of type Sink never gets updated.

Notice the "-o", meaning a "one-shot function". The idea is that one-shot functions are called at most once. (Maybe exactly once, but I think at-most once is better.)

So in your countFrom example, the continuation k would not be floated outside the lambda; and if it was written outside it'd be floated inside the lambda.

GHC already has the notion of a one-shot lambda; it's just not dignified as part of the type system.

Would that serve? I think that you do intend that the argujment of Await is called at most once, don't you?

Typically, yes, but not necessarily. After all, in the minimal example above, if the exception handler gets executed then the whole process starts over. Ideally it would start over with a newly constructed conduit, but if we cannot prevent sharing, it would start over with the same conduit.

comment:27 Changed 6 months ago by edsko

I take your point re <expensive> though. After it, it's common enough to have something like

x <- someConduit
if <expensive> 
  then thisConduit
  else thatConduit

However, I still think we need something more compositional than oneShot. As michaelt_ points out on Reddit,

module Main (main) where
import GHC.Magic

data Sink = Await (Maybe Char -> Sink) | Done Int

countFrom :: Int -> Sink
countFrom n = let k = countFrom $! n + 1
              in Await $ oneShot $ \mi -> case mi of
                  Nothing ->  Done n
                  Just _  -> k

feedFrom :: Int -> Sink -> IO ()
feedFrom _ (Done n)  = print n
feedFrom 0 (Await f) = feedFrom 0       (case f $ Nothing of  a -> a)
feedFrom n (Await f) = feedFrom (n - 1) (case f $ Just 'A' of a -> a)

main :: IO ()
main = let a = feedFrom 10000000 (countFrom 0) in a >> a

doesn't have a space leak. If oneShot was compositional, that would be awesome; we could put the oneShot in the library and then forget about it. Sadly, though perhaps not unexpectedly, this variation _does_ have a space leak again:

module Main (main) where
import GHC.Magic

data Sink = Await (Maybe Char -> Sink) | Done Int

await :: (Maybe Char -> Sink) -> Sink
{-# NOINLINE await #-}
await f = Await (oneShot f)

countFrom :: Int -> Sink
countFrom n = let k = countFrom $! n + 1
              in await $ \mi -> case mi of
                  Nothing ->  Done n
                  Just _  -> k

feedFrom :: Int -> Sink -> IO ()
feedFrom _ (Done n)  = print n
feedFrom 0 (Await f) = feedFrom 0       (case f $ Nothing of  a -> a)
feedFrom n (Await f) = feedFrom (n - 1) (case f $ Just 'A' of a -> a)

main :: IO ()
main = let a = feedFrom 10000000 (countFrom 0) in a >> a

Insisting that users write

await >>= oneShot (\mi -> ...)

instead of

do mi <- await ; ...

doesn't seem like a good solution.

comment:28 Changed 6 months ago by edsko

But perhaps I misunderstood: yes, marking the function type as "oneshot" Maybe Char -o Sink would also do the trick. Although I'm not a fan of calling this "oneshot", or indeed the -o notation (borrowed from linearity). The whole point is that these functions _might_ be executed more than once; after all, if we were guaranteed that that wouldn't happen, we wouldn't be hanging on to them and there would be no memory leak.

But marking that function as "noupdate" or whatever would seem to make sense; of course, now we can ask the same question as you asked above: what about

foo :: Maybe Char -> Sink
{-# NOUPDATE foo #-}
foo = if <expensive> then f1 else f2

but I guess this is far less likely to be a problem in practice; how often do we write something like

await >>= if <expensive> then f1 else f2

I think almost never; this is far more likely

await >>= \mi -> if <expensive> then c1 else c2

and that would be just fine.

Changed 6 months ago by michaelt

Attachment: awaithack.hs added

use oneShot and a rule for await to fight 'full laziness'

comment:29 Changed 6 months ago by michaelt

It seems oneShot can be used at the level of the Sink library to stop this kind of sharing, so that users wouldn't have to worry about it. If I add a monad instance, then I can give the customary definition await = Await Done. If I delay inlining, then the standard use of await, namely

mysink = do 
    mi <- await
    case mi of 
      Nothing -> ...
      Just i  -> ...

works fine in the presence of this rule

{-# RULES "await hack"
     forall f . await >>= f = Await (oneShot f)
      #-}

See the attached https://ghc.haskell.org/trac/ghc/attachment/ticket/12620/awaithack.hs

Last edited 6 months ago by michaelt (previous) (diff)

comment:30 Changed 6 months ago by michaelt

This hack seems to work fine, by the way, if I separate out the general material as a 'library', and then define the counting and feeding function in a separate 'user' module.

comment:31 Changed 6 months ago by michalt

Cc: michalt added

comment:32 Changed 6 days ago by ezyang

Keywords: CSE added
Note: See TracTickets for help on using tickets.