Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#10176 closed bug (fixed)

Invalid core generated with GHC 7.10 RC3

Reported by: NeilMitchell Owned by: nomeata
Priority: high Milestone: 7.10.1
Component: Compiler Version: 7.10.1-rc3
Keywords: Cc: ndmitchell@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Using the Shake repo (https://github.com/ndmitchell/shake.git) at commit d7fa04 and GHC 7.10 RC3 or GHC HEAD, if I {{cabal test}} I get the error:

## BUILD oracle --quiet *str-int
TESTS FAILED
shake-test: Expected an exception but succeeded

Discussion about this issue can be found on the mailing list: http://osdir.com/ml/general/2015-03/msg25847.html

GHC generates the Core:

case (\_ -> error "here") of {}

This is invalid GHC Core. I intend to track down a minimal example shortly.

Attachments (4)

Buggy.hs (762 bytes) - added by NeilMitchell 3 years ago.
Errors.hs (120 bytes) - added by NeilMitchell 3 years ago.
test.bat (184 bytes) - added by NeilMitchell 3 years ago.
log.txt (29.0 KB) - added by NeilMitchell 3 years ago.

Download all attachments as: .zip

Change History (29)

comment:1 Changed 3 years ago by hvr

Fyi, I scripted up a test and git-bisected between RC2 and RC3, and the 6f46fe15af397d448438c6b93babcdd68dd78df8 commit is the one where shake-test oracle test starts failing

comment:2 Changed 3 years ago by hvr

Milestone: 7.10.1
Version: 7.8.47.10.1-rc3

Changed 3 years ago by NeilMitchell

Attachment: Buggy.hs added

Changed 3 years ago by NeilMitchell

Attachment: Errors.hs added

Changed 3 years ago by NeilMitchell

Attachment: test.bat added

Changed 3 years ago by NeilMitchell

Attachment: log.txt added

comment:3 Changed 3 years ago by NeilMitchell

Running ghc -outputdir obj -O -ddump-simpl Buggy.hs produces the output:

case (\ _ [Occ=Dead] _ [Occ=Dead, OS=OneShot] ->
        case lvl_r1lh unit_XyJ of wild2_00 { })
     `cast` ((<()>_R -> Sym (GHC.Types.NTCo:IO[0] <GHC.Prim.Any>_R))
             ; Sym (Buggy.NTCo:ReaderT[0] <()>_R <GHC.Prim.Any>_R)
             :: (()
                 -> GHC.Prim.State# GHC.Prim.RealWorld
                 -> (# GHC.Prim.State# GHC.Prim.RealWorld, GHC.Prim.Any #))
                ~R# ReaderT () GHC.Prim.Any)
of _ [Occ=Dead] {
}

Ignoring the cast (which I don't think is relevant) that is:

case (\_ _ -> ...) of {}

I've reduced the bug so the code only depends on Prelude, requiring no external libraries. I have not been able to remove the requirement for two modules. The partial implementation of ReaderT is taken from the transformers library then specialised to IO.

comment:4 Changed 3 years ago by hvr

Fwiw, I see GHC 7.8.4 producing similiar Core (with -dsuppress-all -dsuppress-uniques):

a =
  \ fun unit1 bool eta ->
    case bool of _ {
      False ->
        case fun unit1 of _ {
          False -> (# eta, () #);
          True -> case m of wild2 { }
        };
      True ->
        case hPutStr2 stdout shows3 True eta of _ { (# ipv, ipv1 #) ->
        case fun unit1 of _ {
          False -> (# ipv, () #);
          True -> case m of wild2 { }
        }
        }
    }

comment:5 Changed 3 years ago by NeilMitchell

That's quite different since (after inlining a few bits):

$wlvl
$wlvl = \ _ -> error "here"

m = $wlvl void#

So m evaluates to _|_ in GHC 7.8 which means a case with no alternatives is correct. However, in 7.10 the scrutinee evaluates to a lambda, not _|_, so no alternatives is wrong.

comment:6 Changed 3 years ago by nomeata

After adding a lint check as suggested by SPJ, I simplified your example to

module Buggy(buggy) where

{-# NOINLINE error1Arg #-}
error1Arg :: () -> a
error1Arg _ = undefined

{-# NOINLINE buggy #-}
buggy :: a
buggy = error1Arg () ()

but I’m not sure if the lint check is actually correct here, as this is the core it complains about:

*** Core Lint errors : in result of Simplifier ***
Buggy.hs:9:10: Warning:
    [in body of lambda with binder a_an8 :: *]
    No alternatives for HNF scrutinee error1Arg @ (() -> a_an8) ()
*** Offending Program ***
error1Arg [InlPrag=NOINLINE] :: forall a_an0. () -> a_an0
[LclId, Arity=2, CallArity=2, Str=DmdType <B,A>b]
error1Arg =
  \ (@ a_anf) _ [Occ=Dead, Dmd=<B,A>] -> undefined @ a_anf

buggy [InlPrag=NOINLINE] :: forall a_amZ. a_amZ
[LclIdX, Str=DmdType b]
buggy =
  \ (@ a_an8) -> case error1Arg @ (() -> a_an8) () of wild_00 { }

*** End of Offense ***

comment:7 Changed 3 years ago by nomeata

Eek. This seems to be my fault: Note the CallArity=2. And indeed, with -fno-call-arity this passes; same for your test case. :-(

comment:8 Changed 3 years ago by NeilMitchell

I was expecting a simple pattern match on (Case Lam{} _ _ []), what does your diff look like?

comment:9 Changed 3 years ago by nomeata

Ok, I think here is what happens, at least in my smaller test case:

  1. Call Arity determines that error1Arg is always called with two arguments. Hence CallArity=2. This is correct.
  2. The simplifier tries to eta-expand error1Arg to take two arguments. This fails in mkEtaWW in CoreArity, where we have this comment:
        -- This *can* legitmately happen:
        -- e.g.  coerce Int (\x. x) Essentially the programmer is
        -- playing fast and loose with types (Happy does this a lot).
        -- So we simply decline to eta-expand.  Otherwise we'd end up
        -- with an explicit lambda having a non-function type
    
  3. But still, the Arity field is updated.
  4. Since it has Arity=2, exprIsHNF (error1Arg @ _ ()) is true, and the simplifier does strange things.

So maybe the solution is to not update Arity with the result from Call Arity, but instead let eta-expansion happen and update Arity with the manifest arity after eta expansion. I’ll try that.

comment:10 Changed 3 years ago by nomeata

Ok, I might have found a suitable fix. You can have a look at Phab:D747, although it’s still pending validation (and the new lint check might yet uncover other problems).

comment:11 Changed 3 years ago by nomeata

Or not quite. I think I did find a bug, but not the only one: With the change in Phab:D747, my small example and your example, when merged into one module, no longer fail. But your original test case still fails with Call Arity, and compiles fine without.

There still seems to be some fishiness with Call Arity and undefined. /me continues brooding over the Core.

comment:12 Changed 3 years ago by hvr

not sure if this is useful information, but I compared the output of

ghc -fforce-recomp -outputdir obj -O -ddump-simpl -dsuppress-uniques Buggy.hs

for acbfc19a6d27b51aaec5177e4b64ea9b45f74c84 (recent ghc-7.10 branch snapshot) vs. 029a296a770addbd096bbfd6de0936327ee620d4 (one commit before the big typeable-change that caused shake-test oracle test to start failing)

And well, the Core output looks just the same, so I'm not sure if the Buggy.hs/Errors.hs test-case highlights the same issue causing shake-test to fail.

comment:13 Changed 3 years ago by nomeata

I simplified your code slightly, removing everything about type classes:

module Buggy(buggy) where

import Errors

newtype ReaderT r a = ReaderT { runReaderT :: r -> IO a }

p = liftReaderT (return ())

m >>> k = ReaderT $ \r -> do runReaderT m r; runReaderT k r

liftReaderT :: IO a -> ReaderT r a
liftReaderT m = ReaderT (const m)

{-# NOINLINE buggy #-}
buggy :: (() -> Bool) -> () -> Bool -> IO ()
buggy fun unit bool =
    runReaderT (
        (if bool then liftReaderT $ print () else p)
        >>>
        (if fun unit then error2Args unit unit >>> p else p)) ()

still exhibits the problem. Removing the newtype, or turning it into a data makes the problem go away.

comment:14 Changed 3 years ago by nomeata

Also, changing the code to

newtype Fun a b = Fun { runFun :: a -> b }
type ReaderT r x = Fun r (IO x)

makes the problem go away. Must be some weird interaction with coercions...

comment:15 Changed 3 years ago by nomeata

I think I found it. The problem is a bad interaction between Call Arity and simplification based on the strictness result.

Consider this code:

e x = error "foo" x

... case e () () of {} ...

The strictness signature for e will be <L,U>b. Note the b: This means: „If you give me one argument, then I’m going to be ⊥, which is true.

The Call Arity for e is 2, which is true: e is always called with two arguments.

So the simplifier phase following Call Arity will replace with

e x y = error "foo" x y

At the same time, the simplifier will replace e () () with e (). After all, both are known to be bottom. The result (after inlining) is the bogus

... case (\y -> error "foo" x y) of {} ...

that we see.

The simplifier looks at the b flag of the strictness analyzer in mkArgInfo, replacing isBotRes result_info with False makes the problem go away, but that is probably not the solution.

Zapping strictness analysis when Call Arity finds something is probably also a bad idea.

Maybe the best solution is to limit the number found by call arity to the number of arguments in the strictness signature, if the strictness signature ends in a b.

comment:16 Changed 3 years ago by nomeata

Status: newpatch

Ok, validates. Does someone want to make sure I’m not talking complete nonsense before I merge this?

comment:17 Changed 3 years ago by NeilMitchell

To assess what does happen in this error case, I augmented the code to be:

buggy fun unit bool =
    runReaderT (do
        if bool then liftReaderT $ print () else pure ()
        if fun unit then do error2Args unit unit; liftReaderT $ print "here2" else pure ()
        ) () :: IO ()

Running do print "here1"; buggy (const True) () True; print "here3" gives here1; here3. GHC has used the presence of error to remove the here2, but since the code just falls out of the end of the function it returns back and still prints here3. That's a nasty result, and pretty much describes what happens in the full test case (I'm in the middle of compiling, and then suddenly I stop and do the thing that comes after compiling).

comment:18 Changed 3 years ago by simonpj

I've checked Phab:D747 which looks great to me; i've made some suggestions.

Maybe we can turn comment:17 into a test that not only should compile (as D747 has) but should run also. thanks!

comment:19 Changed 3 years ago by hvr

I can confirm that 1cc46b1fd5ce794d3a1519c65dcf4aded317598b + phab:D747 allows shake-test oracle test to succeed!

comment:20 Changed 3 years ago by hvr

Owner: set to nomeata

comment:21 Changed 3 years ago by Joachim Breitner <mail@…>

In 5119e097b5cc08d1e6e94529d8c6d7c654a28829/ghc:

Test case for #10176

originally provided by Neil Mitchell. Despite what he observed, I can
observe the bug even with all in one module.

comment:22 Changed 3 years ago by Joachim Breitner <mail@…>

In b4efac59ef5aac74d382d1fd57652982edddbe75/ghc:

Trim Call Arity

to not accidentially invalidate a strictness signature with a Diverges
result info. This seems to fix #10176.

Differential Revision: https://phabricator.haskell.org/D747

comment:23 Changed 3 years ago by nomeata

Status: patchmerge

Pushed. hvr, you said you wanted this to be in 7.10, so marking it as merge.

comment:24 Changed 3 years ago by hvr

Resolution: fixed
Status: mergeclosed

I've merged (& validated) the fix & test (but not the linter-change) to ghc-7.10 via 011f691333aff2833acc900ee3911885e488cf1b & 7e1758a9cf86c28440834d3e3d44737186e5ca5f respectively.

comment:25 Changed 3 years ago by nomeata

Thanks. Note that the test case is quite useless without the lint (but doesn’t hurt either so just leave it like it is).

Note: See TracTickets for help on using tickets.