#8817 closed bug (fixed)

segmentation fault in 7.8 RC1

Reported by: hamishmack Owned by: simonmar
Priority: highest Milestone: 7.8.1
Component: Runtime System Version: 7.8.1-rc1
Keywords: Cc: simonmar
Operating System: Unknown/Multiple Architecture: x86_64 (amd64)
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Revisions:

Description (last modified by hvr)

I get a segmentation fault if I do the following on OS X...

cabal install cpphs --reinstall --force-reinstall --ghc-options=-rtsopts
cpphs +RTS -A16384 -RTS /usr/include/stdio.h

Running in gdb the stack shows we are in stg_ap_0_fast and it looks like it is trying to dereference a pointer value of 0x5000500050000 that is in the rbx register.

Change History (11)

comment:1 Changed 17 months ago by luite

Same problem on linux, tested on Ubuntu 12.04 amd64.

Using -A4096 seems to make the program hang instead of producing a segmentation fault.

comment:2 Changed 17 months ago by hvr

  • Description modified (diff)
  • Milestone set to 7.8.1

comment:3 Changed 17 months ago by hvr

  • Cc simonmar added
  • Component changed from Compiler to Runtime System
  • Owner set to simonmar

I suspect this is actually GC/RTS related; when using -A4096, the output of +RTS -S looks as below (I had to CTRL-C as it won't stop otherwise):

    Alloc    Copied     Live    GC    GC     TOT     TOT  Page Flts
    bytes     bytes     bytes  user  elap    user    elap
     1256      1232      1224  0.00  0.00    0.00    0.00    0    0  (Gen:  0)
        0      1208      1224  0.00  0.00    0.00    0.00    0    0  (Gen:  0)
        0         0      1224  0.00  0.00    0.00    0.00    0    0  (Gen:  0)
        0         0      1224  0.00  0.00    0.00    0.00    0    0  (Gen:  0)
        0         0      1224  0.00  0.00    0.00    0.00    0    0  (Gen:  0)
        0         0      1224  0.00  0.00    0.00    0.00    0    0  (Gen:  0)
...
        0         0      1224  0.00  0.00    0.98    0.98    0    0  (Gen:  0)
        0         0      1224  0.00  0.00    0.98    0.98    0    0  (Gen:  0)
        0         0      1224  0.00  0.00    0.98    0.98    0    0  (Gen:  0)
       72        72      1296  0.00  0.00    0.98    0.98    0    0  (Gen:  0)
cpphs: interrupted
     1024                      0.00  0.00

           2,352 bytes allocated in the heap
           2,512 bytes copied during GC
           6,968 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     215256 colls,     0 par    0.29s    0.29s     0.0000s    0.0000s
  Gen  1         0 colls,     0 par    0.00s    0.00s     0.0000s    0.0000s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.69s  (  0.70s elapsed)
  GC      time    0.29s  (  0.29s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    0.98s  (  0.98s elapsed)

  %GC     time      29.3%  (29.1% elapsed)

  Alloc rate    3,390 bytes per MUT second

  Productivity  70.7% of total user, 70.8% of total elapsed

comment:4 Changed 17 months ago by simonmar

Well -A4k/-A16k is a bit silly, but it should work. I'll take a look.

comment:5 Changed 17 months ago by luite

Hamishmack and me were debugging the random GHCJS segfaults we we've been having for a while, trying to reproduce them with a smaller program, and then stumbled into this. So I hope the fix for this will also affect non-silly scenarios :)

comment:6 Changed 17 months ago by simonmar

  • Priority changed from high to highest

comment:7 Changed 17 months ago by Simon Marlow <marlowsd@…>

In b1ddec1e6d4695d71d38b59db26829d71ad784e1/ghc:

Fix a bug in codegen for non-updatable selector thunks (#8817)

To evaluate most non-updatable thunks, we can jump directly to the
entry code if we know what it is.  But not for a selector thunk: these
might be updated by the garbage collector, so we have to enter the
closure with an indirect jump through its info pointer.

comment:8 Changed 17 months ago by simonmar

  • Status changed from new to merge

I fixed the problem with -A16k, which is actually a nasty codegen bug. I suspect the problem with -A4k is more benign and only occurs with very tiny -A values, and I think it's been around for a while.

So merge this patch, then leave the ticket open with a lower prio for the -A4k fix.

comment:9 Changed 17 months ago by luite

  • Priority changed from highest to normal

Thanks, this appears to have fixed our GHCJS segfaults!

Since the other problem is easily avoided, I set it to normal priority.

comment:10 Changed 17 months ago by luite

  • Priority changed from normal to highest

Oops, let's leave at highest until merged

comment:11 Changed 17 months ago by thoughtpolice

  • Resolution set to fixed
  • Status changed from merge to closed

Merged in 7.8 for RC2.

Note: See TracTickets for help on using tickets.