#13433 closed bug (fixed)

Segmentation faults in profiled way

Reported by: bgamari Owned by: simonmar
Priority: highest Milestone: 8.2.1
Component: Compiler Version: 8.1
Keywords: Cc: simonmar
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D3386
Wiki Page:

Description

It seems that profiling has regressed sometime between GHC 8.0 and 8.2. A few times since September I have noticed that profiled programs (in particular, GHC itself built with profiling enabled) seem to segmentation fault.

This most recent case was produced by building commit 6ebfbdfb64cb8bb5c2ddaf2ad3ad350755c5eb2b with the following in build.mk,

BuildFlavour = prof

define add_mods_flag =
  $(foreach mod,$(2),$(eval $(basename $(mod))_HC_OPTS += $(1)))
endef

$(call add_mods_flag,-fprof-auto,$(wildcard compiler/typecheck/*.hs))

STRIP_CMD = :

and using the resulting stage2 compiler to bootstrap the same commit. Eventually the build will fail with a segmentation fault. Unfortunately it seems the crash isn't entirely reproducible.

Change History (19)

comment:1 Changed 21 months ago by bgamari

Owner: set to bgamari

comment:2 Changed 21 months ago by bgamari

Looking through history since 8.0, #5654 seems relevant.

comment:3 Changed 21 months ago by bgamari

mpickering has reported this as #13387. The repro case on that ticket crashes reliably.

Last edited 21 months ago by bgamari (previous) (diff)

comment:4 Changed 21 months ago by bgamari

Here is some gdb output from a crash,

$ ~/ghc-utils/debug-ghc ~/ghc/roots/8.2-profiled/bin/ghc -v -O2 Main.hs -fforce-recomp +RTS -p 
gdb --args /home/ben/ghc/roots/8.2-profiled/lib/ghc-8.2.0.20170315/bin/ghc -B/home/ben/ghc/roots/8.2-profiled/lib/ghc-8.2.0.20170315 -v -O2 Main.hs -fforce-recomp +RTS -p

GNU gdb (Debian 7.12-4) 7.12
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/ben/ghc/roots/8.2-profiled/lib/ghc-8.2.0.20170315/bin/ghc...run
done.
(gdb) run
Starting program: /mnt/work/ghc/roots/8.2-profiled/lib/ghc-8.2.0.20170315/bin/ghc -B/home/ben/ghc/roots/8.2-profiled/lib/ghc-8.2.0.20170315 -v -O2 Main.hs -fforce-recomp +RTS -p
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff6853700 (LWP 19991)]
[New Thread 0x7ffff6052700 (LWP 19992)]
[New Thread 0x7ffff5851700 (LWP 19993)]
[New Thread 0x7ffff5050700 (LWP 19994)]
Glasgow Haskell Compiler, Version 8.2.0.20170315, stage 2 booted by GHC version 8.0.2
Using binary package database: /home/ben/ghc/roots/8.2-profiled/lib/ghc-8.2.0.20170315/package.conf.d/package.cache
There is no package.cache in /home/ben/.ghc/x86_64-linux-8.2.0.20170315/package.conf.d, checking if the database is empty
There are no .conf files in /home/ben/.ghc/x86_64-linux-8.2.0.20170315/package.conf.d, treating package database as empty
package flags []
loading package database /home/ben/ghc/roots/8.2-profiled/lib/ghc-8.2.0.20170315/package.conf.d
loading package database /home/ben/.ghc/x86_64-linux-8.2.0.20170315/package.conf.d
wired-in package ghc-prim mapped to ghc-prim-0.5.0.0
wired-in package integer-gmp mapped to integer-gmp-1.0.0.1
wired-in package base mapped to base-4.10.0.0
wired-in package rts mapped to rts
wired-in package template-haskell mapped to template-haskell-2.12.0.0
wired-in package ghc mapped to ghc-8.2.0.20170315
wired-in package dph-seq not found.
wired-in package dph-par not found.
package flags []
loading package database /home/ben/ghc/roots/8.2-profiled/lib/ghc-8.2.0.20170315/package.conf.d
loading package database /home/ben/.ghc/x86_64-linux-8.2.0.20170315/package.conf.d
wired-in package ghc-prim mapped to ghc-prim-0.5.0.0
wired-in package integer-gmp mapped to integer-gmp-1.0.0.1
wired-in package base mapped to base-4.10.0.0
wired-in package rts mapped to rts-1.0
wired-in package template-haskell mapped to template-haskell-2.12.0.0
wired-in package ghc mapped to ghc-8.2.0.20170315
wired-in package dph-seq not found.
wired-in package dph-par not found.
*** Chasing dependencies:
Chasing modules from: *Main.hs
!!! Chasing dependencies: finished in 0.94 milliseconds, allocated 0.503 megabytes
Stable obj: []
Stable BCO: []
Ready for upsweep
  [NONREC
      ModSummary {
         ms_hs_date = 2017-03-07 08:43:42 UTC
         ms_mod = Main,
         ms_textual_imps = [(Nothing, Prelude)]
         ms_srcimps = []
      }]
*** Deleting temp files:
Deleting: 
compile: input file Main.hs
*** Checking old interface for Main (use -ddump-hi-diffs for more details):
[1 of 1] Compiling Main             ( Main.hs, Main.o )
*** Parser [Main]:
!!! Parser [Main]: finished in 73.04 milliseconds, allocated 54.502 megabytes
*** Renamer/typechecker [Main]:
!!! Renamer/typechecker [Main]: finished in 572.66 milliseconds, allocated 398.957 megabytes
*** Desugar [Main]:
Result size of Desugar (after optimization)
  = {terms: 11,856, types: 8,892, coercions: 0, joins: 0/0}
!!! Desugar [Main]: finished in 178.87 milliseconds, allocated 132.900 megabytes
*** Simplifier [Main]:

Thread 1 "ghc" received signal SIGSEGV, Segmentation fault.
0x00000000000002e1 in ?? ()
(gdb) bt
#0  0x00000000000002e1 in ?? ()
#1  0x0000000000000000 in ?? ()
(gdb) info reg
rax            0x64a8ec0	105549504
rbx            0x64a8ec0	105549504
rcx            0x64a8ec0	105549504
rdx            0x420acd6000	283649073152
rsi            0x420acd6fff	283649077247
rdi            0x54c96c8	88905416
rbp            0x420b84fbc0	0x420b84fbc0
rsp            0x7fffffff9fc8	0x7fffffff9fc8
r8             0x1	1
r9             0x420b84fc40	283661106240
r10            0x8	8
r11            0x420b84ffd0	283661107152
r12            0x420acd5ff8	283649073144
r13            0x64b0718	105580312
r14            0x64ad160	105566560
r15            0x420b8480d0	283661074640
rip            0x2e1	0x2e1
eflags         0x10207	[ CF PF IF RF ]
cs             0x33	51
ss             0x2b	43
ds             0x0	0
es             0x0	0
fs             0x0	0
gs             0x0	0
(gdb) x/32a 0x420b84fbc0
0x420b84fbc0:	0x54cb638 <stg_sel_6_upd_info+184>	0x64a8ec0 <CCS_DONT_CARE>
0x420b84fbd0:	0x54c98d8 <stg_upd_frame_info>	0x42018bde00
0x420b84fbe0:	0x0	0x6245070
0x420b84fbf0:	0x54d2590 <stg_restore_cccs_eval_info>	0x42018bde00
0x420b84fc00:	0x54c99c0 <stg_marked_upd_frame_info>	0x42018bde00
0x420b84fc10:	0x0	0x42095d8000
0x420b84fc20:	0x54d2590 <stg_restore_cccs_eval_info>	0x42018bde00
0x420b84fc30:	0x54c99c0 <stg_marked_upd_frame_info>	0x42018bde00
0x420b84fc40:	0x0	0x42095d8020
0x420b84fc50:	0x54d2590 <stg_restore_cccs_eval_info>	0x42018bde00
0x420b84fc60:	0x54c99c0 <stg_marked_upd_frame_info>	0x42018bde00
0x420b84fc70:	0x0	0x42095d8040
0x420b84fc80:	0x54d2590 <stg_restore_cccs_eval_info>	0x42018bde00
0x420b84fc90:	0x54c99c0 <stg_marked_upd_frame_info>	0x42018bde00
0x420b84fca0:	0x0	0x42095d8060
0x420b84fcb0:	0x54d2590 <stg_restore_cccs_eval_info>	0x42018bde00

comment:5 Changed 21 months ago by bgamari

A bisection suggests that 2effe18ab51d66474724d38b20e49cc1b8738f60, the Early Inline patch, is the culprit here. Next I'm going to try two things,

comment:6 Changed 21 months ago by simonpj

That is really strange! Inlining doesn't affect semantics, and anything that passes Lint should not seg-fault. So it may have tickled the bug but it seems hard to believe that it's the cause.

comment:7 Changed 21 months ago by bgamari

Quick update: At this point I have determined that the issue is the fix to #5654. Ultimately it seems like we are ending up with a stg_sel_5_upd being invoked on a SimplEnv.FloatFlag, which is not a single-constructor record (it is a enumeration). Naturally, things go terribly awry. Still trying to work out exactly how we get into this situation.

Last edited 21 months ago by bgamari (previous) (diff)

comment:8 in reply to:  7 Changed 21 months ago by dfeuer

Replying to bgamari:

Quick update: At this point I have determined that the issue is the fix to #5654. Ultimately it seems like we are ending up with a stg_sel_5_upd being invoked on a SimplEnv.FloatFlag, which is not a single-constructor record (it is a enumeration). Naturally, things go terribly awry. Still trying to work out exactly how we get into this situation.

How can the fix to a closed ticket fix a new one? I'm missing something.

comment:9 Changed 21 months ago by bgamari

To put it another way, the fix to #5654 caused this regression. Reverting 3a18baff06abc193569b1b76358da26375b3c8d6, 2a02040b2e23daa4f791afc290c33c9bbe3c620c, and 394231b301efb6b56654b0a480ab794fe3b7e4db fixes the crash.

Last edited 21 months ago by bgamari (previous) (diff)

comment:10 Changed 21 months ago by RyanGlScott

Cc: simonmar added

comment:11 Changed 21 months ago by simonmar

Owner: changed from bgamari to simonmar

comment:12 Changed 21 months ago by simonmar

I still haven't been able to repro this. I used exactly the build.mk above, and I've built all of nofib with

make NoFibRuns=0 EXTRA_HC_OPTS="+RTS -p -RTS"

without a single segfault, and I have a pile of .prof files.

This is Linux/x86_64, my tree is master @ bf3952e. Is there anything that might be different about my environment compared to yours that might account for this?

comment:13 Changed 21 months ago by simonmar

I'll try building exactly from 6ebfbdfb64cb8bb5c2ddaf2ad3ad350755c5eb2b as in the description.

Also presumably your build.mk also has this:

ifneq "$(BuildFlavour)" ""
include mk/flavours/$(BuildFlavour).mk
endif

otherwise BuildFlavour has no effect, right?

comment:14 Changed 21 months ago by bgamari

Yes, I should have been more specific: essentially I appended the cited snippet to build.mk.

Very odd that you have been unable to reproduce this. I'm looking in to what might differ in our environments.

comment:15 Changed 21 months ago by bgamari

Alright, I have once again reproduced this. Unfortunately I realized that you actually need to cherry-pick a few patches on top of 6ebfbd as it doesn't build on its own. One of these patches fixes a silly typo. The other is my rather crude fix to #13233 (Phab:D3063) ensuring we don't attempt to tick string literals. I'm a bit suspicious of the latter, but the build doesn't build any lint warnings so I've been operating under the assumption that it's safe.

Without further ado, here is a full repro,

#!/bin/bash -e

git clone git://git.haskell.org/ghc --recursive ghc-T13433
cd ghc-T13433
git checkout 6ebfbdfb64cb8bb5c2ddaf2ad3ad350755c5eb2b
git cherry-pick e4620dc7d2b54c4fd98139c25cff150b7e4b9640 2251905024f963d84d66559202d2377853fdff25
git submodule update

cat >mk/build.mk <<'EOF'
BuildFlavour = prof

ifneq "$(BuildFlavour)" ""
  include mk/flavours/$(BuildFlavour).mk
endif

GhcStage2HcOpts += -dcore-lint -dcmm-lint
define add_mods_flag =
  $(foreach mod,$(2),$(eval $(basename $(mod))_HC_OPTS += $(1)))
endef

$(call add_mods_flag,-fprof-auto,$(wildcard compiler/typecheck/*.hs))

STRIP_CMD = :
EOF
./boot
./configure
make -j8

wget https://ghc.haskell.org/trac/ghc/raw-attachment/ticket/13387/Main.hs
inplace/bin/ghc-stage2 -O2 -fforce-recomp Main.hs +RTS -p
Last edited 21 months ago by bgamari (previous) (diff)

comment:16 Changed 21 months ago by simonmar

Differential Rev(s): Phab:D3386

comment:17 Changed 21 months ago by bgamari

Status: newpatch

Yay Simon!

comment:18 Changed 21 months ago by Simon Marlow <marlowsd@…>

In 074d13eb/ghc:

Fix #13433

Summary: See comments for details.

Test Plan: validate

Reviewers: mpickering, bgamari, austin, erikd

Subscribers: rwbarton, thomie

Differential Revision: https://phabricator.haskell.org/D3386

comment:19 Changed 21 months ago by bgamari

Resolution: fixed
Status: patchclosed
Note: See TracTickets for help on using tickets.