Opened 3 years ago

Closed 3 years ago

#9390 closed bug (fixed)

Inlining prevents evaluation of ignored parts of unboxed tuples

Reported by: snoyberg Owned by:
Priority: normal Milestone: 7.8.4
Component: Compiler Version: 7.8.3
Keywords: Cc:
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Incorrect result at runtime Test Case: simplCore/should_run/T9390
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Consider the following code:

{-# LANGUAGE MagicHash, UnboxedTuples #-}
import GHC.IO (IO (..))
import GHC.Prim

writeB :: MutableArray# RealWorld Char -> IO ()
writeB arr# =
    IO $ \s0# ->
        (# writeArray# arr# 0# 'B' s0#, () #)

inlineWriteB :: MutableArray# RealWorld Char -> ()
inlineWriteB arr# =
    case f realWorld# of
        (# _, x #) -> x
  where
    IO f = writeB arr#

test :: IO Char
test = IO $ \s0# ->
  case newArray# 1# 'A' s0# of
    (# s1#, arr# #) ->
      case seq# (inlineWriteB arr#) s1# of
        (# s2#, () #) ->
          readArray# arr# 0# s2#

main :: IO ()
main = test >>= print

I would expect this code to output the letter 'B'. When compiled without optimizations, that's exactly what it does. However, with optimizations turned on, it seems that it decides that, in inlineWriteB, the state value does not need to be evaluated, which results in the writeArray# call never occurring.

This affected me when working with the vector and primitive packages. I believe I have a workaround in place (see https://github.com/haskell/primitive/pull/11), but this should probably be fixed in GHC as well.

Change History (38)

comment:1 Changed 3 years ago by snoyberg

Note that the following similar code does not reproduce this issue:

import GHC.IO (IO (..))
import GHC.Prim

writeB :: MutableArray# RealWorld Char -> ()
writeB arr# =
    case writeArray# arr# 0# 'B' realWorld# of
        _ -> ()

test :: IO Char
test = IO $ \s0# ->
  case newArray# 1# 'A' s0# of
    (# s1#, arr# #) ->
      case seq# (writeB arr#) s1# of
        (# s2#, () #) ->
          readArray# arr# 0# s2#

main :: IO ()
main = test >>= print

comment:2 Changed 3 years ago by carter

{-# LANGUAGE BangPatterns,  UnboxedTuples,MagicHash #-}

import Data.ByteString.Internal (inlinePerformIO)
import qualified Data.Vector as V
import qualified Data.Vector.Mutable as VM

import System.IO.Unsafe

main :: IO ()
main = do
    vm <- VM.new 1
    VM.write vm 0 'A'
    !b<- return $! 'B'
    let !x = unsafePerformIO $! VM.write vm 0 b
    x `seq` (V.freeze vm >>= print)

will output fromList "B" at O0 and and fromList "A" at O1 and O2, so Its not related to using InlinePerformIO (which made me a bit more skeptical, but then i then saw that I can hit this problem without accurseevilperformIO)

comment:3 Changed 3 years ago by carter

its important to also note from the docs in GHC.Prim

data State# s Source

State# is the primitive, unlifted type of states. It has one type parameter, thus State# RealWorld, or State# s, where s is a type variable. The only purpose of the type parameter is to keep different state threads separate. It is represented by nothing at all.

data RealWorld Source

RealWorld is deeply magical. It is primitive, but it is not unlifted (hence ptrArg). We never manipulate values of type RealWorld; it's only used in the type system, to parameterise State#.

i'm not sure if your expectations on the evaluation are correct from this perspective. I could /likely am incorrect though.

comment:4 Changed 3 years ago by snoyberg

I can buy that interpretation, in which case it would seem that my pull request to primitive isn't a workaround, but the correct way to do things. It would be nice to get an authoritative answer on this, though.

comment:5 Changed 3 years ago by simonpj

Well this is very interesting. It turns out that

  • The desugarer was generating Core that did not satisfy the let/app invariant (in CoreSyn)
  • Core Lint was failing to check that the invariant was satisfied
  • The simplifier, quite reasonably, assumed that the invariant holds, and thereby discarded an expression that has side effects.

Patch coming

Great bug report, btw. Thanks!

Simon

comment:6 Changed 3 years ago by thoughtpolice

Milestone: 7.8.4

comment:7 Changed 3 years ago by simonpj

I have this 95% done, but keep getting distracted. I hope to commit my fix on Thursday. It can go in 7.8.4 if we ever make such a release.

Meanwhile, notice that you are in effect using an inlined version of unsafePerformIO: your function inlineWriteB conjures up a realWorld# and then discards it again. If you use the real unsafePerformIO (which is not inlined until much later if at all) I think the problem will go away. That might be a workaround.

Simon

comment:8 Changed 3 years ago by snoyberg

Yes, using unsafePerformIO or unsafeDupablePerformIO works as expected in my example. However, Carter's example above fails even with unsafePerformIO, so something fishy is still going on. And if I apply my patch to the primitive package to work around the issue here, Carter's example no longer gives an incorrect result. So even though the avoidance of inlining in unsafePerformIO does work around the simple example I gave above, there are still places where the current behavior can manifest.

comment:9 Changed 3 years ago by carter

On the flip side, the version in my example is also VERY weird code to write also, I can't imagine anything like my example occurring in the wild. Most uses of unsafe*PerformIO in the wild "return" a value thats used by the rest of the program, BUT both mine and snoymans examples are per se "dead code" wrt the alleged purity!

I think my example might just simply be legal CSE wrt alleged purity.

comment:10 Changed 3 years ago by simonpj

The example in comment:2 is different in kind to the main one. Considre

  (unsafePerformIO (do { write arr 0 'A'; return 3 })
  +
  (unsafePerformIO (do { write arr 0 'B'; return 4 })

After you evaluate this, does arr[0] contain 'A' or 'B'? Obviously it depends on the evaluation order of (+). GHC is, quite specifically, at liberty to change evaluation order -- notably the strictness analyser makes quite radical changes in evaluation order -- so you absolutely cannot rely on it. Even if you say

   a `seq` b `seq` d

you cannot rely on a being evaluated before b.

In short:

  • GHC is tries never to discard exceptions, divergence, or write effects
  • but it is free to re-order them

(See our paper "A semantics for imprecise exceptions" for lots of detail on this.)

The only way to enforce sequencing of write effects is to us the data dependency of the state token. By using unsafePerformIO you are specifically saying "I don't mind in what order these effects are performed relative to everything else.

unsafeInterleaveIO lets you ensure that it occurs after the effects thus far in the do-block have occurred. So this:

main = do
    vm <- VM.new 1
    VM.write vm 0 'A'
    !b<- return $! 'B'
    x <- unsafeInterleaveIO $! VM.write vm 0 b
    x `seq` (V.freeze vm >>= print)

always prints "B".

These things are not obvious. We have a good place for "collaborative documentation" of GHC, here. Would any of you like to write an explanatory page there?

Simon

comment:11 Changed 3 years ago by snoyberg

I'm confused about the end of your comment, in particular:

unsafeInterleaveIO lets you ensure that it occurs after the effects thus far in the do-block have occurred

But your example then implies that the x thunk will reliably be evaluated before V.freeze vm >>= print, which does not seem to follow from your comment. Can you clarify?

I'd be happy to take a crack at writing up some documentation. For a while, I've wanted to have a clear set of rules for when unsafePerformIO, unsafeDupablePerformIO, and inlinePerformIO (aka unsafeAccursedPerformIO) are safe to use. Once I get some more clarity from our discussion here, I'd like to take a pass at such a Wiki page, and would greatly appreciate your review.

comment:12 Changed 3 years ago by simonpj

(Retrospectively edited to remove incorrect stuff!)

My comment meant that the x thunk will reliably be evaluated only after performing the write A side effect.

The question of whether it is performed before the freeze is quite a separate one. Yes it will, because the seq ensures that (in effect) there is a data dependency between x and (V.freeze vm >>= print), so we can't perform the latter action until we have evaluated x.

To put it another way, if you write

a `seq` blah

then GHC won't guarantee the order in which a and blah will be evaluated. But we do guarantee that both a and blah will have been evaluated before the value of blah is returned. (But we do not guaranteed that a is evaluated before the evaluation of blah is begun.)

So if you have (a `seq` b `seq` blah), then you know that a, b, and blah will all evaluated before the value is returned.

Does that make it clearer?

Simon

Last edited 3 years ago by simonpj (previous) (diff)

comment:13 Changed 3 years ago by snoyberg

I think so, let me see if I've got a grasp of this. In Carter's example above, there are essentially the following steps:

  1. Create a new vector.
  2. Write 'A' into the vector.
  3. Write 'B' into the vector.
  4. Freeze and print the vector.

The ordering dependencies in the code are as follows:

  • Steps 2, 3, and 4 all depend on step 1 occurring.
  • Step 4 depends on step 2 (due to ordering in the IO monad) and step 3 (due to the usage of seq).

However, there is no clearly expressed ordering between 2 and 3. Even those we only force evaluation of step 3's thunk "after" step 2 is performed, there is nothing in the code to express this as a strict ordering requirement, and therefore GHC is at full liberty to perform step 3 before step 2.

In your code, however, by using unsafeInterleaveIO, we have in fact stated a strict ordering requirement that step 3 occur after step 2, which solves the problem.

And none of this has anything to do with my original example, which has to do with some primops being optimized away depending on how their resulting State# is handled (whether inside a case or not).

comment:14 Changed 3 years ago by simonpj

Correct. And I edited comments:12, where I had previously said something wrong.

I should also add that (a `pseq` blah) does guarantee that a is evaluated before the evaluation of blah is begun. That's the difference between seq and pseq.

Simon

comment:15 Changed 3 years ago by snoyberg

I think you were implying this, but just for complete clarity (and since I'd want to include it in an explanatory document), when we have:

x `seq` (y :: IO a)

We have no control of whether x or y will be evaluated first, and therefore there is no ordering of side effects from the evaluation of x and y. However, the IO action contained by y *will* be guaranteed to be run after both x and y are evaluated. In other words, with the code:

let x = unsafePerformIO $ putStrLn "x evaluated"
    y = unsafePerformIO $ do
            putStrLn "y evaluated"
            return $ "y run"
x `seq` y

The ordering of "x evaluated" and "y evaluated" is undefined, but we are guaranteed that both of them will be print before "y run".

Last edited 3 years ago by snoyberg (previous) (diff)

comment:16 Changed 3 years ago by Simon Peyton Jones <simonpj@…>

In 1fc60ea1f1fd89b90c2992d060aecb5b5a65f8c0/ghc:

When desugaring Use the smart mkCoreConApps and friends

This is actually the bug that triggered Trac #9390.  We had
an unboxed tuple (# writeArray# ..., () #), and that writeArray#
argument isn't ok-for-speculation, so disobeys the invariant.

The desugaring of unboxed tuples was to blame; the fix is easy.

comment:17 Changed 3 years ago by Simon Peyton Jones <simonpj@…>

In ab6480b8d8ea45ae6958558245266153df071aa5/ghc:

Extensive Notes on can_fail and has_side_effects

In fixing Trac #9390 I discovered that I *still* didn't really understand
what the can_fail and has_side_effects properties of a PrimOp mean, precisely.

The big new things I learned are

* has_side_effects needs to be true only of *write* effects,
  Reads (which are, strictly speaking, effects) don't matter here.

* has_side_effects must be true of primops that can throw a synchronous
  Haskell exception (eg raiseIO#)

* can_fail is true only of primops that can cause an *unchecked* (not
  Haskell) system exception, like divide by zero, or accessing memory
  out of range through an array read or write.

I've documented all this now.  The changes in this patch are only
in comments.

comment:18 Changed 3 years ago by Simon Peyton Jones <simonpj@…>

comment:19 Changed 3 years ago by simonpj

Test Case: simplCore/should_run/T9390

If we ever release 7.8.4, the patch in comment:16 should go in.

I'm just leaving this open pending Michael's explanatory document

Simon

comment:20 Changed 3 years ago by simonpj

Status: newmerge

Actually I'll change it to 'merge' status lest we forget to merge it one day.

comment:21 Changed 3 years ago by snoyberg

OK, I've put together a draft of a document explaining what we've discussed in this ticket:

https://www.fpcomplete.com/tutorial-preview/4431/z0KpB0ai2R

Comments (and especially corrections) highly welcome!

There's one thing in particular I wasn't sure of: why is lazy necessary in the definition of unsafeDupablePerformIO?

comment:22 Changed 3 years ago by simonpj

Some quick comments (I'm about to go on holiday):

  • You shouldn't have to quote the documentation for pseq to understand seq. If seq's documentation is inadequate, let's fix it.
  • You say "seq ensures that both one and one + two will be evaluated before the seq expression is evaluated". I would rather say "...before the result of the seq expression is returned".
  • "We can get the same guaranteed ordering of evaluation by having a function which is only strict in one of its arguments". Definitely not! GHC will inline add and all will be lost. Only pseq guarantees this behaviour.
  • Passage starting "This looks like it should be straightforward". Alas you have found yet another dark corner. Here is a comment from the demand analyser:
            -- Note [IO hack in the demand analyser]
            --
    	-- There's a hack here for I/O operations.  Consider
    	-- 	case foo x s of { (# s, r #) -> y }
    	-- Is this strict in 'y'.  Normally yes, but what if 'foo' is an I/O
    	-- operation that simply terminates the program (not in an erroneous way)?
    	-- In that case we should not evaluate y before the call to 'foo'.
    	-- Hackish solution: spot the IO-like situation and add a virtual branch,
    	-- as if we had
    	-- 	case foo x s of 
    	--	   (# s, r #) -> y 
    	--	   other      -> return ()
    	-- So the 'y' isn't necessarily going to be evaluated
    	--
    	-- A more complete example (Trac #148, #1592) where this shows up is:
    	--	do { let len = <expensive> ;
    	--	   ; when (...) (exitWith ExitSuccess)
    	--	   ; print len }
    
    You can look at the tickets mentioned for some more background. I hate the hack, but the bottom line is that no, GHC does not (and I think should not) evaluate helper2 before running helper1.

That's as far as I got. Thanks for writing this -- it's a good forcing function.

comment:23 Changed 3 years ago by snoyberg

Thanks for the review, and enjoy the vacation. Some responses:

  • I've added a note to myself to send a pull request on the documentation of seq next week.
  • I've updated the text as you indicated.
  • Thank you for clarifying. I've left the example that I originally had, but instead of saying "this will work too," it says "you might think this will work, but due to inlining, it won't."
  • I'm confused about this last part, because the situation I'm describing seems to be *exactly* the bug that Carter reports in comment 2. Your explanation here implies to me that we never really fully explained why Carter's code is acting as it does, though I can guess that it is in fact another manifestation of this bug.

comment:24 Changed 3 years ago by carter

@snoyberg, does the fpco website run the inline code at O0? the inlinePerformIO example in the document gives the A B result rather than the shared result!

comment:25 Changed 3 years ago by snoyberg

@carter: It runs it with GHCi (or close enough), so none of the rearranging of code will occur. I added a comment towards the top of the article explaining.

comment:26 Changed 3 years ago by snoyberg

I've submitted a differential (is that the right term?) on Phabricator for improvements to the seq documentation: https://phabricator.haskell.org/D136

comment:27 Changed 3 years ago by Austin Seipp <austin@…>

In cbfa107604f4cbfaf02bd633c1faa6ecb90c6dd7/ghc:

Improve seq documentation; part of trac issue #9390

Summary: Signed-off-by: Michael Snoyman <michael@snoyman.com>

Test Plan: Review documentation change

Reviewers: simonpj, austin

Reviewed By: austin

Subscribers: phaskell, hvr, simonmar, relrod, ezyang, carter

Differential Revision: https://phabricator.haskell.org/D136

GHC Trac Issues: #9390

comment:28 Changed 3 years ago by snoyberg

Simon: Do you believe the document I wrote (https://www.fpcomplete.com/tutorial-preview/4431/z0KpB0ai2R) is in a good enough state to submit for wider review, perhaps on the GHC mailing list?

comment:29 Changed 3 years ago by simonpj

Basically yes. Some additional thoughts:

  • "...To quote the docs for pseq" This seems like a complete non-sequitur when talking about seq, and in any case only documents seq in a sideways kind of way. Better to quote the (brand new) documentation for seq. (Add a note to say it's new if you like.)
  • "...If we want to be certain of the ordering, use pseq". That's the place to quote pseq's documentation.
  • "You might think (as I did) that we can get the same guaranteed ordering of evaluation by having a function which is only strict in one of its arguments:" This is a bit confusing, because we don't know if (+) here is overloaded. I assume you mean (+) at Int? If so, it's strict in both arguments, so adding the strictness annotation doesn't change anything anyway. A ! would only have a chance of changing evaluation order if the function was lazy. And if it was lazy then the ! would make it strict, and that would be preserved by inlining. So maybe my earlier comment about inlining was a red herring.
  • I think it might be helpful to articulate the baseline story for unsafePerformIO, namely: use it only when you don't mind which order the effects are performed in, relative to both the main I/O monad and other calls to unsafePerformIO. You are on thin ice if you go beyond that; and the thin ice is what this tutorial discusses.
  • "to understand why unsafeInterleaveIO is semantically different from return . unsafePerformIO, we need to drop down a layer of abstraction". Not really. To understand how, you need to drop down. But the guarantees are perfectly well defined. Given do { ...before...; x <- unsafeInterleaveIO (...side...); ...after...} then:
    • Effects in ...side... will happen after effects in ...before....
    • But effects in ...side... maybe occur arbitrarily interleaved with effects in ...after....
  • I am not sure what the lazy in unsafeDupablePeformIO is either! Maybe Simon Marlow does?

I think it's good enough to launch on the masses though.

Thanks for doing this.

Simon

comment:30 Changed 3 years ago by snoyberg

Thanks for the further review. I've made some updates based on these comments, and have published the document. I'll email Simon Marlow about lazy in unsafeDupablePerformIO, and send an email to the GHC mailing list about the document in general.

comment:31 Changed 3 years ago by simonmar

The lazy on unsafeDupablePerformIO comes from this commit: e23efcffb3ebca88826044e3a8b818924c42e7ae. Looks like it ought to be documented, if the reason is still valid.

comment:33 Changed 3 years ago by simonpj

Oh yes thanks!

First point. I really don't like the reasoning though. How would you explain to a client of unsafePerformIO what the specification is? Perhaps "the effects of the unsafePerformIO are all performed before the result is evaluated". But that is not true if the effects somehow depend strictly on the value concerned. If it's hard to specify, we should be cautious about relying on it.

After all, the example

      unsafeDupablePerformIO (\s -> let r = f x in
                             case writeIORef v r s of (# s1, _ #) ->
                             (# s1, r #) )

really is strict in r. If you evaluate it, since writeIORef returns r, it's clear that r will be evaluated. Saying "I want to write the reference before evaluating r is very delicate!". Moreover you can achieve the same effect, where it matters, with less magic. Here is the current code from libraries/base/tests/Memo2.lhs:

memo' f ref weak_ref = \k -> unsafePerformIO $
   do { ...blah...
       ; case lkp of
            Just res -> do { putMVar ref (size,table); return res }
            Nothing  -> do { let res = f k
                           ; ...blah...
                           ; return res }

Now if we make the argument function given to unsafePerformIO return a 1-tuple thus, we are good:

memo' f ref weak_ref = \k -> case do_effects k of {# result #) -> result
  where
    do_effects k = unsafePerformIO $
       do { ...blah...
          ; case lkp of
               Just res -> do { putMVar ref (size,table); return (# res  #) }
               Nothing  -> do { let res = f k
                              ; ...blah...
                              ; return {# res #) }

Mind you (thinking aloud here), I suppose that this does rely on the strictness analyser not being super-clever. If unsafePerformIO's signature was clever enough to say "in demand d, I call my function argument and evaluate the second component the result with demand d" then we'd be back in the same boat as before.

Another, perhaps more robust alternative, would be to say:

memo' f ref weak_ref = \k -> unsafePerformIO $
   do { ...blah...
       ; case lkp of
            Just res -> do { putMVar ref (size,table); return res }
            Nothing  -> do { let res = f k
                           ; ...blah...
                           ; return (lazy res) }

Here the lazy means that the res binding is not strict.

Anyway the point it that this subtle stuff should be visible in the caller, for the rare moments when it is needed, rather than hidden in unsafePerformIO.

All that said, there is is, and I suppose it may break things in rather subtle ways if we remove it. So perhaps we should leave specifically leave it un-documented!

Second point. I think it would be clearer as

unsafeDupablePerformIO (IO m) = case m realWorld# of (# _, r #) -> lazy r

That is, make the laziness wrap the 'r' part only, which is the important bit here. I tried this:

{-# NOINLINE u1 #-}
u1  :: IO a -> a
u1 (IO m) = lazy (case m realWorld# of (# _, r #) -> r)

{-# NOINLINE u2 #-}
u2  :: IO a -> a
u2 (IO m) = case m realWorld# of (# _, r #) -> lazy r

{-# NOINLINE u3 #-}
u3  :: IO a -> a
u3 (IO m) = case m realWorld# of (# _, r #) -> r

and got this in the interface file:

  u1 :: GHC.Types.IO a -> a
    {- Arity: 1, HasNoCafRefs, Strictness: <L,1*C1(U(A,1*U))>,
       Inline: NOINLINE -}
c9e24d21e66094c44ef0629d326e9a33
  u2 :: GHC.Types.IO a -> a
    {- Arity: 1, HasNoCafRefs, Strictness: <C(S),1*C1(U(A,U))>,
       Inline: NOINLINE -}
83ded03238c0647421262c926910e195
  u3 :: GHC.Types.IO a -> a
    {- Arity: 1, HasNoCafRefs, Strictness: <C(S(LS)),1*C1(U(A,1*U))>,
       Inline: NOINLINE -}

The strictness signatures say this:

  • u1 (the status quo) is lazy in its first argument (a function)
  • u2 (which I advocate) calls the function and evaluates the result to HNF
  • u3 (the buggy version with no lazy) calls the function and evaluates the second component of the pair it returns.

So u2 looks fine. I might try making that change and seeing if there are any regressions. Unless you can think of any reason not to.

Simon

comment:34 Changed 3 years ago by simonmar

I agree that the memo table implementation is relying in a very delicate way on a particular order of evaluation, so I'd be fine with using an explicit lazy in the implementation of memo and removing it from unsafeDupablePerformIO. If you want to do this and all the tests pass, it's fine by me.

comment:35 Changed 3 years ago by snoyberg

Note that, after some feedback, I've moved the document to the GHC Wiki:

http://www.haskell.org/haskellwiki/Evaluation_order_and_state_tokens

comment:36 Changed 3 years ago by Herbert Valerio Riedel <hvr@…>

In abff2ffd2d6e30b93daa0def282b9fc0795ad10d/ghc:

Move docstring of `seq` to primops.txt.pp

The documentation for `seq` was recently augmented via #9390 &
cbfa107604f4cbfaf02bd633c1faa6ecb90c6dd7. However, it doesn't show
up in the Haddock generated docs because `#ifdef __HADDOCK__` doesn't
work as expected.  Also, it's easier to just fix the problem at the
origin (which in this is case is the primops.txt.pp file).

The benefit/downside of this is that now the extended documentation
shows up everywhere `seq` is re-exported directly.

comment:37 Changed 3 years ago by thoughtpolice

Milestone: 7.8.47.10.1

Moving (in bulk) to 7.10.4

comment:38 Changed 3 years ago by thoughtpolice

Milestone: 7.10.17.8.4
Resolution: fixed
Status: mergeclosed

Merged to 7.8.4 since this is actually a good bug to fix.

Note: See TracTickets for help on using tickets.