Opened 2 years ago

Closed 23 months ago

Last modified 22 months ago

#6041 closed bug (invalid)

Program hangs when run under Ubuntu Precise

Reported by: dsf Owned by:
Priority: high Milestone: 7.4.2
Component: Compiler Version: 7.4.1
Keywords: Cc: JeremyShaw, clifford.beshers@…, ross
Operating System: Linux Architecture: Unknown/Multiple
Type of failure: Runtime crash Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

This code hangs when running under Ubuntu Precise. However, it works in a Precise changeroot on an Ubuntu Lucid machine -- indicating it could be kernel specific. My guess would be that it would succeed on a Precise machine running a Lucid kernel, but we have not tried that.

Any of the following changes make the code work:

  1. replacing 'readTVar u' with 'return ()'
  2. removing the 'Wrapper' monad and just using 'StateT'
  3. deriving the 'MonadState?' instance instead of righting it by hand
  4. copying the definition of 'modify' into the local module and use that instead of the imported version

The final mystery: If the binary is built in a precise changeroot (where it works) and then copied to a precise machine ... it still works. And if it is built on a real precise machine, where it fails, it still fails when copied to a precise changeroot on a lucid machine.

So, there's that.

Attachments (5)

Main.hs (1.3 KB) - added by dsf 2 years ago.
log-debug-succeeding (977 bytes) - added by dsf 2 years ago.
log-debug-failing (1.6 KB) - added by dsf 2 years ago.
log-strace-succeeding (10.9 KB) - added by dsf 2 years ago.
log-strace-failing (9.2 KB) - added by dsf 2 years ago.

Download all attachments as: .zip

Change History (22)

Changed 2 years ago by dsf

comment:1 Changed 2 years ago by JeremyShaw

  • Cc JeremyShaw added

The tool chain for the real precise machine and the precise chroot were both installed by apt-get installing the same build of GHC. The environments should be nearly identical aside from things like the kernel, which are not affected by chroot.

The fact that the code does not work in the real precise environment is mysterious on its own. Especially given the types of changes that make it work. However, we have seen that type of bug before. But, what makes this bug even more mysterious is the fact that some subtle aspect of the environment the compiler is running in is also significant. Though, it does seem like STM is needed to trigger the bug -- which could indicate that the RTS is involved? Some aspect that is effected by the environment the code is built in, rather than the environment it is run in?

We could possibly provide ssh access to a machine that exhibits this bug if needed. We are currently only running precise on a laptop, so its availability is variable.

comment:2 Changed 2 years ago by simonmar

  • Difficulty set to Unknown
  • Milestone set to 7.4.2
  • Priority changed from normal to high
  • Status changed from new to infoneeded

Could you collect some more information for me:

  • Compile with -debug, run with +RTS -Ds (both a working and a failing run)
  • run under strace (both a working and a failing run)

You didn't mention whether this was with -threaded or not. Does that make a difference?

Changed 2 years ago by dsf

Changed 2 years ago by dsf

Changed 2 years ago by dsf

Changed 2 years ago by dsf

comment:3 Changed 2 years ago by dsf

Up until now we have been using the -threaded option, but I removed it for these tests and the behavior is the same.

comment:4 Changed 2 years ago by simonmar

  • Status changed from infoneeded to new

comment:5 Changed 2 years ago by simonmar

Fascinating - the failing case enters a black hole and then exits with <<loop>>. You said originally it was hanging, but the trace seems to show that it exited, is that right?

Unfortunately I can't tell what black hole it has entered without debugging. Could you set me up an SSH login?

comment:6 Changed 2 years ago by dsf

I believe it hangs when run from ghci and prints <<loop>> and exits when compiled. I have set up an account simonmar/simonmar for you, ssh to foxthompson.dynalias.org.

comment:7 Changed 2 years ago by dsf

(There's nothing interesting on that machine, in case you're all wondering!)

comment:8 Changed 2 years ago by cliffordbeshers

  • Cc clifford.beshers@… added

comment:9 Changed 2 years ago by simonmar

  • Status changed from new to infoneeded

I can't seem to reproduce the bug. Here's my session with the things I've tried:

simonmar@x220:~$ ghc Main.hs
[1 of 1] Compiling Main             ( Main.hs, Main.o )
Linking Main ...
simonmar@x220:~$ ./Main
hello
simonmar@x220:~$ uname -a
Linux x220 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
simonmar@x220:~$ ghci Main.hs
GHCi, version 7.4.1: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Ok, modules loaded: Main.
Prelude Main> main
Loading package transformers-0.3.0.0 ... linking ... done.
Loading package mtl-2.1.1 ... linking ... done.
Loading package array-0.4.0.0 ... linking ... done.
Loading package stm-2.3 ... linking ... done.
hello
Prelude Main> 
Leaving GHCi.
simonmar@x220:~$ ghc -O Main.hs
simonmar@x220:~$ ghc -O Main.hs -fforce-recomp
[1 of 1] Compiling Main             ( Main.hs, Main.o )
Linking Main ...
simonmar@x220:~$ ./Main
hello
simonmar@x220:~$ ghc -O2 Main.hs -fforce-recomp
[1 of 1] Compiling Main             ( Main.hs, Main.o )
Linking Main ...
simonmar@x220:~$ ./Main
hello
simonmar@x220:~$ ghc -O2 -threaded Main.hs -fforce-recomp
[1 of 1] Compiling Main             ( Main.hs, Main.o )
Linking Main ...
simonmar@x220:~$ ./Main
hello

Could you try with the simonmar account on that machine and see if you can reproduce it?

comment:10 Changed 2 years ago by dsf

Now I can't reproduce it anywhere. I do have one binary that still exhibits the behavior, I've placed it in your home directory.

I dist-upgraded the machine yesterday before I created your account. I figured if this bug was caused by something in libc6 we would want to know that. So that might have made it go away. On the other hand, that should make it go away in the binary that is still exhibiting the bug. Bad decision on my part? Maybe. I have been routinely dist-upgrading the precise install as we approached the final release.

I could try reinstalling the beta iso if we want to pursue it, but if its caused by nastyness in libc I'm not sure its worth it. Let me know if you want me to try this. Otherwise I may not get to it for a few days.

comment:11 Changed 2 years ago by simonmar

  • Resolution set to worksforme
  • Status changed from infoneeded to closed

I looked into it a bit, and the binary seems to contain a top-level CAF of the form "x = x", which obviously causes a loop when evaluated. It's not clear where this came from, and without being able to reproduce it I can't make any further progress. Thanks for the report - if it happens again, please reopen the ticket.

comment:12 Changed 23 months ago by guest

  • Resolution worksforme deleted
  • Status changed from closed to new

Hello

I was very intrigued by this report, and hunted the bug for a long time, but it's very simple. It's not GHC.

You did not define state for Wrapper. That's not bad, since the documentation in mtl-2.1 states that minimal definition is get/put. The default definition for state is:

    state f = do
      s <- get
      let ~(a, s) = f s
      put s
      return a

but in transformers-0.3 you can find:

instance (Monad m) => Monad (StateT s m) where
    return a = state $ \s -> (a, s)
    [...]

so state and return are mutually calling themselves, and state in second code should be StateT. Bug in transformers. I reopened the ticket as I am unsure where to report this...

comment:13 Changed 23 months ago by guest

On the other hand I undo my diagnosis - that state in transformers will evaluate to StateT monad, not Wrapper. In any case here is a simpler program giving <<loop>> on my machine:

{-# LANGUAGE FlexibleContexts, FlexibleInstances, GeneralizedNewtypeDeriving, MultiParamTypeClasses #-}
module Main (main) where

import Control.Monad.State    (MonadState, StateT, evalStateT, get, put, state)

modify :: MonadState s m => (s -> s) -> m ()
modify f = state (\s -> ((), f s))

newtype Wrapper a = Wrapper { unWrapper :: StateT () IO a }
    deriving (Functor, Monad)

instance MonadState () Wrapper where 
  get   = Wrapper get
  put s = Wrapper (put s)
--  state f = Wrapper (state f)  -- uncomment and it works

setUnique :: Wrapper ()
setUnique =
    do u <- get
       seq u $ return ()

main :: IO ()
main =
      do putStrLn "hello"
         evalStateT (unWrapper (modify id >> setUnique)) ()

comment:14 Changed 23 months ago by simonmar

  • Cc ross added

Thanks for the diagnosis.

Ross - please see the bug report against transformers above. I'm closing this ticket here.

comment:15 Changed 23 months ago by simonmar

  • Resolution set to invalid
  • Status changed from new to closed

comment:16 Changed 22 months ago by cliffordbeshers

The modified program provided by guest does not loop for me, using SeeReason?'s Ubuntu package environment where the bug was originally witnessed (albeit with the packages rebuilt many times since.) ghc 7.4.1 and transformers 0.3.

Ross, if you make a new ticket, please drop a forwarding link here.

comment:17 Changed 22 months ago by guest

@cliffordbeshers: Does the original Main.hs given by dsf gives <<loop>> on your machine while my reduced version does not? That would be extremely strange and if so I would reopen the ticket. I can reproduce looping on both programs, with Ubuntu Precise, 3.2.0-25-generic-pae, GHC 7.4.1.

I am pretty sure of my overall diagnosis. Please note that all fixes (1) - (4) given in initial bug report are connected to the state monad. The program puts infinite loop as a state with modify id, and then attempts to read it with readTVar u. For example, if you change _ <- liftIO $ atomically $ readTVar u to print $ seq u (), the behaviour is the same, indicating that TVars are not core of the issue and can be replaced with ().

Note: See TracTickets for help on using tickets.