segmentation fault in compiled program, involves gtk, selinux

changed weight to 5

Attached file Try1.hs ($822).

Program that crashes

Attached file build (download).

build script for Try1.hs

Trac metadata

Trac field	Value
CC	→ garrett.mitchener@gmail.com

By the way: I compiled Try1 (attached) on my Fedora 17 machine with ghc 7.0.4, and copied the executable over to my Fedora 18 machine. That executable works fine, no seg fault.

And while I was at it: I copied the executable compiled with ghc 7.4.1 from my Fedora 18 machine to my Fedora 17 machine, and it segfaults on Fedora 17 as well as 18.

I reproduced this on Fedora 18 i686 (doesn't happen on x86_64).

Trac metadata

Trac field	Value
CC	garrett.mitchener@gmail.com → garrett.mitchener@gmail.com, juhp@community.haskell.org

Attached file Try2.hs ($839).

Smaller testcase using only glib: "ghc-7.4.2 --make Try2.hs" && ./Try2 => segfaults on i686

I would be curious if this still happens with ghc-7.6.

changed milestone to %7.6.2

changed weight to 7

assigned to @simonmar

Could someone compile the program with -debug, run it under gdb, and grab a backtrace with bt please?

Trac metadata

Trac field	Value
Priority	normal → high

Attached file Try2-bt.txt (download).

gdb backtrace of Try2 from fedora 19/rawhide

I did what I could about getting a backtrace, see new attachment, but it's not much info. I compiled it with

ghc --make Try2 -debug

with ghc-7.4.2 on a virtual machine running fedora 19/rawhide.

Is your GHC using the libffi that comes with Fedora, or the one bundled with GHC?

The problems I'm having with Try2 are compiled just as ghc is packaged on Fedora. According to ldd:

(on fedora 19, ghc 7.4.2) ldd Try2 yields

linux-gate.so.1 => (0xb772f000)

libgobject-2.0.so.0 => /lib/libgobject-2.0.so.0 (0x43995000)

libglib-2.0.so.0 => /lib/libglib-2.0.so.0 (0x43829000)

libgmp.so.10 => /lib/sse2/libgmp.so.10 (0x4eb12000)

libffi.so.6 => /lib/libffi.so.6 (0x439e8000)

libm.so.6 => /lib/libm.so.6 (0x4373a000)

librt.so.1 => /lib/librt.so.1 (0x4372f000)

libdl.so.2 => /lib/libdl.so.2 (0x43728000)

libc.so.6 => /lib/libc.so.6 (0x4354e000)

libpthread.so.0 => /lib/libpthread.so.0 (0x4370c000)

/lib/ld-linux.so.2 (0x4352b000)

so I think all of these are the fedora packaged libraries.

libffi on fedora 17 is libffi.5 but on fedora 18 and 19/rawhide, it's libffi.6

Is there some sort of version clash going on here, where ghc doesn't work with libffi.6? Or does ghc require specific patches to libffi?

(I have to copy libffi.6 over to f17 machines to run test cases where I compile something on rawhide and run it on f17.)

I just tried to get ghc-7.4.2 from the generic linux build tar.bz2 file on haskell.org, but it won't work on f19/rawhide because it requires libgmp.so.3, but f19/rawhide comes with libgmp.so.10.... I'm trying to avoid rebuilding the entire haskell platform somewhere to track down the source of this bug. Suggestions?

Investigating the possibility that libffi.so.6 is where the bug lives: After trying and failing to get ghc to use libffi.5 on fedora 19/rawhide (either in /lib or in the ghc installation tree), I tried making /lib/libffi.so.6 a link to libffi.so.5. Then Try2 still segfaults in the same place.

I decided to try with ghc 7.6.2. On fedora 19/rawhide, Try2 seg faults in the same place.

I also tried building ghc 7.4.2 on fedora 17, and installing gtk via cabal: Try2 segfaults in the same place. So whatever's going on, it isn't just which version of libffi, and it isn't just the fedora release.

A bit more information: I compiled Try2.hs with

ghc -debug --make Try2.hs

This is on Fedora 18 with ghc 7.4.1. (FYI: I'm also using the development version of gtk2hs; darcs gives latest patch date of Tue Feb 19 16:39:46 EST 2013). Now Try2 fails with a concrete error message:

Try2: internal error: ASSERTION FAILED: file rts/STM.c, line 1476

(GHC version 7.4.1 for i386_unknown_linux)

Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug

Aborted

which seems maybe to have something to do with not doing one STM transaction inside another? The function where that assertion happens is stmWait.

mentioned in issue #7718 (closed)

I've been working with gtk 2.32.4, ghc 7.4.2, and the development tree from gtk2hs. I added a few print statements and tracked down this much of the problem:

makeCallback: funPtr = 0xb7e8900c
makeCallback: destroyFunPtr = 0x0821fcd6
g_timeout_add_full: function = 0xb7e8900c
g_timeout_add_full: data = 0xb7e8900c
g_timeout_add_full: notify = 0x821fcd6
g_main_dispatch: dispatch = 0xb7ed11c0
g_main_dispatch: source = 0x82c0500
g_main_dispatch: callback = 0xb7e8900c
g_main_dispatch: user_data = 0xb7e8900c
g_timeout_dispatch: source = 0x82c0500
g_timeout_dispatch: callback = 0xb7e8900c
g_timeout_dispatch: user_data = 0xb7e8900c

(gdb) disass /r 0xb7e8900c,+5
Dump of assembler code from 0xb7e8900c to 0xb7e89011:
   0xb7e8900c:	e8 c3 14 3b 50	call   0x823a4d4
End of assembler dump.

(gdb) disass /r 0x823a4d4,+20
Dump of assembler code from 0x823a4d4 to 0x823a4e8:
=> 0x0823a4d4:	00 00	add    %al,(%eax)
   0x0823a4d6:	00 00	add    %al,(%eax)
   0x0823a4d8:	20 00	and    %al,(%eax)
   0x0823a4da:	00 00	add    %al,(%eax)
   0x0823a4dc <stg_sel_ret_5_upd_info+0>:	89 f0	mov    %esi,%eax
   0x0823a4de <stg_sel_ret_5_upd_info+2>:	83 e0 fc	and    $0xfffffffc,%eax
   0x0823a4e1 <stg_sel_ret_5_upd_info+5>:	8b 70 18	mov    0x18(%eax),%esi
   0x0823a4e4 <stg_sel_ret_5_upd_info+8>:	83 c5 04	add    $0x4,%ebp
   0x0823a4e7 <stg_sel_ret_5_upd_info+11>:	f7 c6 03 00 00 00	test   $0x3,%esi
End of assembler dump.

In gtk2hs/Glib/System/Glib/MainLoop.chs, makeCallback function, the call to mkSourceFunc (which is a foreign import wrapper) seems to return a thunk stored at 0xb7e8900c, but the function call right at that address seems to be off by 8 bytes? Those first four instructions make no sense. The seg fault happens at that first add %al, (%eax) because %eax is a bad pointer.

I just added a minimal example that doesn't need GTK -- see attachment ghc-bug-002.zip.

It's a simple case of Haskell calling into C calling back into Haskell. I'm using Fedora 17. The program works fine when compiled under GHC 7.0.4:

Setting callback
set_callback: at top
set_callback: p_callback = (nil)
set_callback: callback_data = 0
set_callback: p_finalizer = (nil)
set_callback: new pointer values:
set_callback: p_callback = 0xb77ee02c
set_callback: callback_data = 10
set_callback: p_finalizer = 0xb77ee00c
set_callback: done
Invoking callback
invoke_callback: at top
invoke_callback: p_callback = 0xb77ee02c
invoke_callback: callback_data = 10
invoke_callback: p_finalizer = 0xb77ee00c
invoke_callback: calling callback
invoke_callback: return value is 11
invoke_callback: done
Clearing callback
clear_callback: at top
clear_callback: p_callback = 0xb77ee02c
clear_callback: callback_data = 10
clear_callback: p_finalizer = 0xb77ee00c
clear_callback: finalizing callback
clear_callback: p_callback = (nil)
clear_callback: callback_data = 0
clear_callback: p_finalizer = (nil)
clear_callback: done

But it seg faults under GHC 7.4.2.

Setting callback
set_callback: at top
set_callback: p_callback = (nil)
set_callback: callback_data = 0
set_callback: p_finalizer = (nil)
set_callback: new pointer values:
set_callback: p_callback = 0xb77d702c
set_callback: callback_data = 10
set_callback: p_finalizer = 0xb77d700c
set_callback: done
Invoking callback
invoke_callback: at top
invoke_callback: p_callback = 0xb77d702c
invoke_callback: callback_data = 10
invoke_callback: p_finalizer = 0xb77d700c
invoke_callback: calling callback
Segmentation fault

On the Ubuntu 12.10 live image, after installing GHC 7.4.2, it runs with no seg fault. However, Ubuntu doesn't use SELinux. Maybe the thunk that goes back into Haskell is jumping to the wrong address, a few bytes before the actual function, and the instructions there are basically harmless, but SELinux catches them?

On Fedora 17, with GHC 7.4.2, I tried running valgrind on the Main program from ghc-bug-002.zip with this result:

valgrind --leak-check=full ./Main
==30226== Memcheck, a memory error detector
==30226== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==30226== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==30226== Command: ./Main
==30226== 
Setting callback
set_callback: at top
set_callback: p_callback = (nil)
set_callback: callback_data = 0
set_callback: p_finalizer = (nil)
set_callback: new pointer values:
set_callback: p_callback = 0x401102c
set_callback: callback_data = 10
set_callback: p_finalizer = 0x401100c
set_callback: done
Invoking callback
invoke_callback: at top
invoke_callback: p_callback = 0x401102c
invoke_callback: callback_data = 10
invoke_callback: p_finalizer = 0x401100c
invoke_callback: calling callback
==30226== Invalid read of size 1
==30226==    at 0x822D5E8: freeSignalHandlers (Signals.c:90)
==30226==  Address 0xa is not stack'd, malloc'd or (recently) free'd
==30226== 
==30226== 
==30226== Process terminating with default action of signal 11 (SIGSEGV)
==30226==  Access not within mapped region at address 0xA
==30226==    at 0x822D5E8: freeSignalHandlers (Signals.c:90)
==30226==  If you believe this happened as a result of a stack
==30226==  overflow in your program's main thread (unlikely but
==30226==  possible), you can try to increase the size of the
==30226==  main thread stack using the --main-stacksize= flag.
==30226==  The main thread stack size used in this run was 8388608.
==30226== 
==30226== HEAP SUMMARY:
==30226==     in use at exit: 40,622 bytes in 32 blocks
==30226==   total heap usage: 52 allocs, 20 frees, 43,076 bytes allocated
==30226== 
==30226== LEAK SUMMARY:
==30226==    definitely lost: 0 bytes in 0 blocks
==30226==    indirectly lost: 0 bytes in 0 blocks
==30226==      possibly lost: 0 bytes in 0 blocks
==30226==    still reachable: 40,622 bytes in 32 blocks
==30226==         suppressed: 0 bytes in 0 blocks
==30226== Reachable blocks (those to which a pointer was found) are not shown.
==30226== To see them, rerun with: --leak-check=full --show-reachable=yes
==30226== 
==30226== For counts of detected and suppressed errors, rerun with: -v
==30226== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault

Attached file ghc-bug-002.zip (download).

Minimal example; works on ghc 7.0.4, seg faults on ghc 7.4.2; tried an extra command line option in ./build, put in separate small function for easier breakpoint in gdb

Doing some work with gdb with ghc-bug-003 (which I'll attach in a minute)

On Ubuntu 12.10 with GHC 7.4.2 (no SELinux, no seg fault), the call to set_callback comes out like this:

Breakpoint 1, set_callback (f=0xb7b3f02c, d=10, fin=0xb7b3f00c) at Callback.c:18
18  p_callback = f;
(gdb) x/i f
   0xb7b3f02c:call   0x80a1358 <adjustorCode>
(gdb) print adjustorCode
$1 = {<text variable, no debug info>} 0x80a1358 <adjustorCode>

On Fedora 17 with GHC 7.4.2 (with SELinux, seg faults), the call to set_callback comes out like this:

Breakpoint 1, set_callback (f=0xb7ffd02c, d=10, fin=0xb7ffd00c) at Callback.c:18
18	  p_callback = f;
(gdb) x/i f
   0xb7ffd02c:	call   0x82274b8
(gdb) print adjustorCode
$1 = {<text variable, no debug info>} 0x82264b8 <adjustorCode>

so is something going wrong with this adjustorCode function?

Attached file ghc-bug-003.zip (download).

Abbreviated test case, with strace and .s files. (.se means on Fedora which enables SELinux, .ns means on Ubuntu which does not; 7.x.x is the version of GHC that generated the file.)

Maybe it's not an 8 byte problem. If the callback is eventually supposed to call adjustorCode, then the error is even weirder:

On Fedora 17, (SE, GHC 742), in just_invoke_callback (ghc-bug-003), tracing through...

Inside createAdjustor in ghc-7.4.2/rts/Adjustor.c, the AdjustorStub code that is generated at line 386 :-o is

(gdb) disas /r adjustorStub,+5
Dump of assembler code from 0xb7ffc02c to 0xb7ffc031:
   0xb7ffc02c:	e8 87 a4 22 50	call   0x82264b8 <adjustorCode>
End of assembler dump.

e8 is the opcode for an ip-relative jump.

The same bytes during set_callback and just_invoke_callback are interpreted differently for some reason:

(gdb) print adjustorCode
$20 = {<text variable, no debug info>} 0x82264b8 <adjustorCode>

(gdb) disas /r *p_callback,+5
Dump of assembler code from 0xb7ffd02c to 0xb7ffd031:
   0xb7ffd02c:	e8 87 a4 22 50	call   0x82274b8   <- off by 0x1000 from adjustorCode
End of assembler dump.

which means something hideous has happened.

Got it:

ghc-7.4.2/rts/Adjustor.c:380

createAdjustor calls allocateExec (rts/sm/Storage.c) which calls ffi_closure_alloc. So in createAdjustor, line 381, we should have (if I'm reading the libffi documentation correctly)

adjustorStub is a pointer in data address space to the adjustor stub code is a pointer in code address space to the very same spot in memory

and sure enough they are off by 0x1000:

(gdb) print adjustorStub
$3 = (AdjustorStub *) 0xb7ffc00c
(gdb) print code
$4 = (void *) 0xb7ffd00c

which means the correct calculation of the relative call should be

*(long*)&adjustorStub->call[1] = ((char*)&adjustorCode) - ((char*)code + 5); // code instead of adjustorStub

Apparently code and data and done with different segment settings under SELinux. Chaos follows.

Going to rebuild GHC 7.4.2 with that change and see if this works...

Sorry, formatting of the last message went wrong: createAdjustor calls allocateExec (rts/sm/Storage.c) which calls ffi_closure_alloc. So in createAdjustor, line 381, we should have (if I'm reading the libffi documentation correctly):

adjustorStub is a pointer in data address space to the adjustor stub

code is a pointer in code address space to the very same spot in memory

and the relative call needs to be calculated in code address space

Okay, it works!

I've attached a patch, going to do a few more tests. Now what?

Attached file Fix-adjustor.patch (download).

Patch for rts/Adjustor.c

More tests: My gtk-based simulation program works with the above patch on Fedora 17 with GHC 7.4.2.

By the way, the same mistake in Adjustor.c seems to be present in all later versions of GHC as well.

Well done for tracking this down!

Your fix looks good to me. Could someone validate and push please?

Before I forget: The pointer that gets returned after all of that is the data-space address rather than the code-space address, and I suppose that must be right so that the memory block can be deallocated later. But it sort of worries me that the call instruction to that data-space address works. Does the CPU or kernel recognize that the same memory is also mapped to a code-space address and make some correction?

The two addresses contain the same memory (double-mapped), but one is writable while the other is executable. This is how libffi works around the SELinux restrictions. On non-SELinux systems the code and data addresses are probably the same.

This function, createAdjustor returns the code address, not the data address.

Replying to [ticket:7629#comment:69958 simonmar]:

Well done for tracking this down!

Your fix looks good to me. Could someone validate and push please?

I'm new at this process: Is "validate and push" something I'm supposed to do or does someone on the inside of the GHC group do this?

mentioned in commit 27cf625a

closed

wgmitchener: It's something one of the GHC team does.

I've now validated and pushed; thanks for diagnosing it and sending the patch!

Trac metadata

Trac field	Value
Resolution	Unresolved → ResolvedFixed

I note for the record that this didn't make ghc-7.6.3 but will be in ghc-7.8.

I am finally backporting the patch to Fedora now.

wgmitchener: Thank you again for fixing this.

Trac metadata

Trac field	Value
CC	garrett.mitchener@gmail.com, juhp@community.haskell.org → garrett.mitchener@gmail.com, juhp@community.haskell.org, simonmar

added runtime crash label

added Phigh label

Trac field	Value
Version	7.4.2
Type	Bug
TypeOfFailure	OtherFailure
Priority	normal
Resolution	Unresolved
Component	Runtime System
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture

segmentation fault in compiled program, involves gtk, selinux

Child items 0

Activity