On my machine, it detects a few value mismatches before crashing with sigsegv.
$ time ./.stack-work/install/x86_64-linux-nopie/nightly-2017-10-10/8.2.1/bin/bugvalue mismatchvalue mismatchvalue mismatchvalue mismatchzsh: segmentation fault (core dumped) ./.stack-work/install/x86_64-linux-nopie/nightly-2017-10-10/8.2.1/bin/bug./.stack-work/install/x86_64-linux-nopie/nightly-2017-10-10/8.2.1/bin/bug 2.11s user 0.25s system 66% cpu 3.543 total
I believe this is what is causing crashes in xmobar. See discussion: https://github.com/jaor/xmobar/issues/310. Note that the crash in xmobar still happens without -threaded option, while this example only breaks when compiled with -threaded.
Trac metadata
Trac field
Value
Version
8.2.1
Type
Bug
TypeOfFailure
OtherFailure
Priority
highest
Resolution
Unresolved
Component
Runtime System
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
For whatever reason, I'm not able to reproduce this on my Ubuntu 14.04 or 17.04 machine (both with GHC 8.2.1). I'm doing this:
$ ghc -fforce-recomp -threaded bug.c Bug.hs[1 of 1] Compiling Main ( Bug.hs, Bug.o )Linking Bug ...$ ./Bug
It then proceeds to run forever (AFAICT) without hitting any value mistmatches or segfaults.
Some questions:
What operating system are you using?
How can I reproduce this issue //with just GHC//? Please, no instructions involving fancy build tools like stack, since if this really is a GHC bug, one should be able to trigger the issue with just GHC.
Thanks for your report andrewchan; unfortunately, as with RyanGlScott, I am unable to reproduce this with +RTS -N4, +RTS -N1, or under any of GHC's optimization levels on Debian 9 running on amd64. Having a standalone testcase, free of build tools, would be quite helpful.
ghc Main.hs test.c -threaded -O1 -fforce-recomp[1 of 1] Compiling Main ( Main.hs, Main.o )Linking Main ...
For me the program fails within seconds:
$ time ./Mainvalue mismatchvalue mismatchvalue mismatchvalue mismatchzsh: segmentation fault (core dumped) ./Main./Main 2.19s user 0.20s system 67% cpu 3.553 total
I'm also able to reproduce the issue in a fedora virtual machine on the same physical machine using ghc 8.2.1 binaries downloaded from haskell.org.
I can reproduce this same issue on my machine, I am using:
NixOS x86_64 Unstable Branch (as of October 12th 2017)
GHC 8.2.1
Binutils 2.28.1
GCC 6.4.0
I noticed the bug does not occur and the program runs infinitely if I simply compile with
'ghc Main.hs test.c -threaded -o Bug', however, if Optimization level 1 or 2 are enabled, the bug happens very quickly after running the binary.
Well, ticket:14346#comment:143804 certainly explains why -g avoids the crash: in 8.2 source note ticks essentially prevented GHC from marking anything as a join point.
I could have sworn I left a comment last night but it seems I am mistaken. Here is what I discovered while looking into this so far:
The test is indeed rather environment sensitive. Moreover, as it doesn't occur under rr I strongly suspect it's a race of some sort. When compiled with -debug the eventual segmentation fault always seems to occur in stg_putMVarzh. Specifically here,
Putting a watch point on the the memory address and reverse continuing leads to this:
Old value = 1New value = -5590387370x0000000000470b42 in base_GHCziEventziPoll_new5_info ()=> 0x0000000000470b42 <base_GHCziEventziPoll_new5_info+1218>: 49 89 04 24 mov QWORD PTR [r12],rax
(rr) p/x $r12$27 = 0x42000b7540
Not sure what's going on there, but I hope this is of some help.
So it appears that the crazy TSO is loaded in stg_putMVar# on line 1737:
...// There are readMVar/takeMVar(s) waiting: wake up the first onetso=StgMVarTSOQueue_tso(q);// <--- hereStgMVar_head(mvar)=StgMVarTSOQueue_link(q);if(StgMVar_head(mvar)==stg_END_TSO_QUEUE_closure){StgMVar_tail(mvar)=stg_END_TSO_QUEUE_closure;}...
Here q is 0x42000b7530 which is a fairly reasonable-looking MVAR_TSO_QUEUE, except with a completely wild tso field,
Indeed the last guy to write to StgMVarTSOQueue_tso(q) is the FFI target, test,
Dump of assembler code for function test:=> 0x00000000004044f0 <+0>: movl $0xdeadbeef,(%rdi) 0x00000000004044f6 <+6>: retq
where %rdi == 0x00000042000b7540.
Let's look at the calling sequence produced by GHC,
_c4Rp: movq $block_c4Ru_info,-8(%rbp) # I64[Sp - 8] = c4Ru; movq %rax,(%rbp) # I64[Sp] = _s4Ok::I64; addq $-8,%rbp # Sp = Sp - 8; movq 872(%r13),%rbx # _u4RJ::P64 = CurrentTSO; movq 24(%rbx),%rcx # I64[I64[_u4RJ::P64 + 24] + 16] = Sp; movq %rbp,16(%rcx) movq 888(%r13),%rcx # _u4RK::I64 = CurrentNursery; leaq 8(%r12),%rdx # P64[_u4RK::I64 + 8] = Hp + 8; # I64[_u4RJ::P64 + 104] = I64[_u4RJ::P64 + 104] # - ((Hp + 8) - I64[_u4RK::I64]); movq %rdx,8(%rcx) leaq 8(%r12),%rdx subq (%rcx),%rdx movq 104(%rbx),%rcx subq %rdx,%rcx movq %rcx,104(%rbx) # (_u4RH::I64) = call "ccall" arg hints: [PtrHint,] # result hints: [PtrHint] # suspendThread(BaseReg, 0); subq $8,%rsp # native-call stack adjustment movq %r13,%rdi # setup argument 1 (BaseReg) xorl %esi,%esi # setup argument 2 (0) movq %rax,%rbx # Save $rax in callee-saved register xorl %eax,%eax # No floating point arguments for this call call suspendThread addq $8,%rsp # undo stack adjustment subq $8,%rsp # redo stack adjustment; silly GHC movq %rbx,%rdi # ??? <--- This is where the bad argument comes from movq %rax,%rbx # Spill again? But we never actually unspilled it! # I think this is where we go wrong xorl %eax,%eax # No floating point arguments for this call call test # Native call addq $8,%rsp # undo stack adjustment subq $8,%rsp # you are such a joker, GHC movq %rbx,%rdi xorl %eax,%eax call resumeThread ...
It looks to me like what happens here is that we spill $rax (which contains a pointer to the current MVar closure) to $rbx twice, losing knowledge of the first spill. Consequently we end up passing the MVar as the argument to test. Hilarity ensues.
On looking at this with fresh eyes, it seems that unfortunately my analysis from ticket:14346#comment:143885 is flawed; the movq %rbx,$rdi is completely correct. We spill to the callee-saved %rbx register before suspendThread and then more the value from %rbx to %rdi, which is where we expect the first argument to reside. The second spill is simply preserving _u4RH, which is still alive after the call to test.
Back to the drawing board. I think now I'll focus on catching the issue earlier in execution; namely, when we first get the value mismatch message.
Really interestingly replacing forever with replicateM_ 1000000000 doesn't trigger the bug anymore.
A bit of speculation: compiler sees that the touch# at the end of allocaBytes is unreachable due to forever, and so ignores it and allows the allocated are to be GC'ed.
Very good insights, alexbiehl and andrewchen. Indeed it looks indeed the GC is (correctly, given the code) concluding that the array is unreachable. Looking at the -dverbose-core2core output one sees that the touch# call is dropped during one of the simplifier passes (SimplMode {Phase = 0 [post-call-arity], inline, rules, eta-expand, case-of-case}). That is certainly the cause of the crash.
To answer, a few of your questions:
is it ok to store an address which clearly points
into heap allocated memory but doesn't point to
an info table?
In the above case the answer is probably yes. This pointer is saved as a field of a stack frame (namely a return frame for block_c4Dx_info). The info table for this frame likely declares this field as a non-pointer. Consequently it won't be traced by the GC. Of course, for this to be safe we do need to keep some reference to the ByteArray# itself. That is where we go wrong.
The rtsSupportsBoundThreads is a ccall, don't we have to save R1` over these calls?
This will happen when we lower the call in nativeGen. Cmm is platform/calling-convention independent and so we can't yet determine which registers need to be saved and which do not for a particular call.
It looks like to avoid this we will either need to teach the simplifier not to throw away otherwise dead continuations which contain some "important" primops (e.g. touch#) or mark allocaBytes as NOINLINE so the simplifier can't see the bad simplification (which seems to be how this is typically dealt with; e.g. see GHC.Compact.Serialized).
However, in general I wonder whether touch# is more unsafe than strictly necessary. It seems to me that for a tad of stack allocation you can get a much safer way to keep values alive. The trick is to introduce a primop,
with#::a->r->r
When with# a cont is entered, the entry code will,
Push an StgWithFrame, a new sort of return frame which carries a reference to a, onto the stack
Enters cont
When cont returns, it will enter the entry code for StgWithFrame, which will simply pop itself and return. I believe this should be more robust against the simplifier; in particular, the present bug couldn't occur under this scheme.
So for instance, alloca would be changed from being (parapharsing to avoid obfuscating the point with state passing),
It seems to me that this does a much better job of capturing the real idea allocaBytes seeks.
The only potential issue with this proposal is float-out; I suspect we might need to say that we can't float out of a with# application. I'll need to sleep on that.
It looks like to avoid this we will either need to teach the simplifier not to throw away otherwise dead continuations which contain some "important" primops
Can you give an example to show what it is throwing away, and why that's bad? I don't get it yet. I have even forgotten why touch# exists.
Interesting, can someone boil down the transformation that dropped the touch#?
Simon: touch# is keeping the ByteArray# alive until after the action, in allocaBytes (see ticket:14346#comment:144131). The action itself doesn't keep the array alive, because it is working with the raw pointer, not the ByteArray#. This is how we allocate temporary memory for marshalling data between Haskell and C, because it's a lot faster to allocate memory on the Haskell heap than to use malloc() and free().
I imagine the simplifier has proven that action never returns and then dropped the case with the continuation containing the touch#. That seems like a reasonable thing to do.
I like @bgamari's alternative suggestion of with#, although we probably want it to be
with# :: a -> (State# s -> (# State# s, b #)) -> State# s -> (# State# s, b #)
otherwise the second argument must be a thunk (yuck).
I imagine the simplifier has proven that action never returns and then dropped the case with the continuation containing the touch#. That seems like a reasonable thing to do.
Correct. I believe it is Simplify.rebuildCall that is responsible for this. I also agree: this seems like a perfectly reasonable thing to do and it's not clear how exactly we would prevent this behavior in the case of touch# (since the touch# may be buried deep in the continuation).
Alright, I have marked allocaBytes and allocaBytesAligned as NOINLINE for 8.2.2. A more principled solution, in the form of #14375 (closed), coming in 8.4.1.
Sadly 404bf05e never made into HEAD nor GHC 8.4 branch and #14375 (closed) hasn't been completed before GHC 8.4 release. Hence HEAD and 8.4.* suffer from this bug (see #15260 (closed)).
I've made a new diff reintroducing the workaround and adding a regression test: D5020.
This issue seems quite serious to me and it's actually encountered in the wild (see the related ticket). bgamari, can we milestone this for the next release?
ticket:14346#comment:145503 says "correctness issue is resolved for now" but I don't see how. As far as I can see no commits were done for this ticket.
This issue seems quite serious to me and it's actually encountered in the wild (see the related ticket). bgamari, can we milestone this for the next release?
Could we also have a 8.4.4 release?
ticket:14346#comment:145503 says "correctness issue is resolved for now" but I don't see how. As far as I can see no commits were done for this ticket.