Opened 18 months ago

Closed 17 months ago

Last modified 17 months ago

#8834 closed bug (fixed)

64-bit windows cabal.exe segfaults in GC

Reported by: awson Owned by:
Priority: highest Milestone: 7.8.1
Component: Compiler Version: 7.8.1-rc2
Keywords: Cc: simonmar, jstolarek
Operating System: Windows Architecture: x86_64 (amd64)
Type of failure: Runtime crash Test Case:
Blocked By: Blocking:
Related Tickets: Differential Revisions:

Description

cabal.exe built with 64-bit windows GHC segfaults doing cabal configure or cabal install or probably something else.

This occurs *both* for cabal-install 1.18.0.2 built against Cabal 1.18.1.2 and for cabal-install 1.18.0.3 built against Cabal 1.18.1.3.

This occurs only for 64-bit build, 32-bit build is fine.

64-bit GHC-7.6.3 build is also fine.

During debugging I see segfault occurs inside evacuate somewhere near get_itbl call I guess.

If I make cabal.exe to not trigger some presumably bad GC compiling it with -rtsopts and running it as (for example) cabal +RTS -H256m -m50 -RTS ... the problem disappears.

Attachments (11)

cmm_sink_revert.patch (5.6 KB) - added by awson 18 months ago.
correct.cmm (37.0 KB) - added by awson 18 months ago.
wrong.cmm (36.1 KB) - added by awson 18 months ago.
lines1_bad (84.5 KB) - added by awson 18 months ago.
lines1_good (86.6 KB) - added by awson 18 months ago.
unpatched_O1 (107.8 KB) - added by awson 18 months ago.
BugIso compiled by 7.8rc2 with -O
O1_no_spec_constr.cmm (104.4 KB) - added by jstolarek 18 months ago.
GHC HEAD, -O1 option enabled
O1_spec_constr.cmm (81.8 KB) - added by jstolarek 18 months ago.
GHC HEAD, -O1 and -fspec-constr options enabled
cmm_global__regs_not_trivial_patch.cmm (83.9 KB) - added by jstolarek 18 months ago.
isTrivial does not inline global registers, -O1 and -fspec-constr options enabled
foreignTargetRegs_are_active_regs.cmm (82.9 KB) - added by jstolarek 18 months ago.
all global registers conflict with unsafe foreigh calls, -O1 and -fspec-constr options enabled
0001-Use-the-correct-callClobberedRegs-on-Windows-x64.patch (886 bytes) - added by simonmar 18 months ago.

Download all attachments as: .zip

Change History (96)

comment:1 Changed 18 months ago by awson

Probably, it's worth to add that I observed this problem since I was able to build 64-bit windows GHC 7.7+ (a month or so).

comment:2 Changed 18 months ago by awson

  1. I've reduced this case to the following:
    import qualified Data.ByteString.Char8 as BSS
    
    main :: IO ()
    main = do
      cache <- BSS.readFile "00-index.cache"
      print (length $ BSS.lines cache)
    

00-index.cache is hackage packages cache file. Even truncated to 12000 lines it gives segfault.

  1. After bisecting I've found the problematic commit. I don't understand what is going here. I think I've made absolutely my best in this situation. Please, someone fix this! I feel myself completely alone fighting with numerous win64 bugs which nobody ever bother to look into.

Changed 18 months ago by awson

comment:3 Changed 18 months ago by awson

Since things were changed since that commit, I've revised it and made manual reversal. This commit was mainly a set of comments, relevant changes were more or less local and I've created this reversal patch mechanically. I still don't understand underlying machinery, but it looks this patch makes things work again on win64.

comment:4 Changed 18 months ago by awson

  • Status changed from new to patch

I've tested this reversal patch extensively and all works fine.

Moreover, it is not platform-specific, so it probably fix other weird segfaults on other platforms.

That problematic commit is particularly suspicious because if gets rid of callerSaves - the only explicitly platform-specific function used in CmmSink.hs. That looks very strange.

Last edited 18 months ago by awson (previous) (diff)

comment:5 Changed 18 months ago by awson

  • Component changed from None to Compiler
  • Type of failure changed from None/Unknown to Runtime crash

comment:6 Changed 18 months ago by refold

Would be nice if @jstolarek (the author of that commit) could comment.

comment:7 Changed 18 months ago by simonpj

  • Cc simonmar jstolarek added

comment:8 Changed 18 months ago by simonmar

@awson thanks for all the diagnosis, sorry you have to be the one to do all this. I don't think there are any other regular developers building and testing on 64-bit Windows. I hope that when we get nightly builds working again things will improve.

I think your analysis is right, that it is the change to okToInline that is at fault. Would you mind doing one more test for me? I'd like to know whether it still works if you don't revert the changes to isTrivial, which I believe should be fine.

comment:9 Changed 18 months ago by simonpj

Could this account for #8870?

comment:10 Changed 18 months ago by jstolarek

awson, great bisecting job here!

The intention behind the faulty commit is that both older and newer implementations are semantically identical (which turns out not to be the case) but the newer one avoids code duplication.

I think your analysis is right, that it is the change to okToInline that is at fault.

Two other possibilities include:

  1. conflict function
  2. instance definition for GlobalReg datatype in compiler/cmm/CmmNode.hs in the DefinerOfRegs. That's the place were we say what global registers are defined (and therefore clobbered) by each Cmm node.

awson, to pin this bug we have to know the difference between correct and incorrect Cmm. Could you compile your minimal example with GHC HEAD (which will give us the segfaulting Cmm) and with your patched version (which will give us working Cmm) and upload them here? Dump the Cmm with -ddump-cmm flag. I see that you're calling print in your example. My experience is that it adds a lot of extra Cmm to analyse. Could you see if splitting your code into two modules also causes the bug:

module T8834 where

import qualified Data.ByteString.Char8 as BSS

T8834 :: IO Int
T8834 = do
  cache <- BSS.readFile "00-index.cache"
  return (length $ BSS.lines cache)
module Main where

import T8834

main :: IO ()
main = T8834 >>= print

If this also causes the bug then most likely we only need Cmm dump for T8834 module.

Changed 18 months ago by awson

Changed 18 months ago by awson

comment:11 Changed 18 months ago by awson

Well, separating to modules also causes the bug. I put Cmm dumps of T8834 here. But is this enough? Could the bug manifest itself somewhere else (e. g. base or bytestring libraries)?

Last edited 18 months ago by awson (previous) (diff)

comment:12 follow-up: Changed 18 months ago by awson

@simonmar, unreverting isTrivial brokes things again.

comment:13 Changed 18 months ago by jstolarek

I see something that looks suspicious. In the correct version we have:

c1zB:
    _r1xg::P64 = R1;
    if ((Sp + -16) < SpLim) goto c1zC; else goto c1zD;
c1zC:
    R1 = _r1xg::P64;
    call (stg_gc_enter_1)(R1) args: 8, res: 0, upd: 8;
c1zD:
    (_c1zx::I64) = call "ccall" arg hints:  [PtrHint,
                                             PtrHint]  result hints:  [PtrHint] newCAF(BaseReg, _r1xg::P64);
    if (_c1zx::I64 == 0) goto c1zz; else goto c1zy;
c1zz:
    call (I64[_r1xg::P64])() args: 8, res: 0, upd: 8;
c1zy:
    I64[Sp - 16] = stg_bh_upd_frame_info;
    I64[Sp - 8] = _c1zx::I64;
    R2 = c1zA_str;
    Sp = Sp - 16;
    call GHC.CString.unpackCString#_info(R2) args: 24, res: 0, upd: 24;

In the incorrect one we have:

c1zy:
    if ((Sp + -16) < SpLim) goto c1zz; else goto c1zA;
c1zz:
    R1 = R1;
    call (stg_gc_enter_1)(R1) args: 8, res: 0, upd: 8;
c1zA:
    (_c1zu::I64) = call "ccall" arg hints:  [PtrHint,
                                             PtrHint]  result hints:  [PtrHint] newCAF(BaseReg, R1);
    if (_c1zu::I64 == 0) goto c1zw; else goto c1zv;
c1zw:
    call (I64[R1])() args: 8, res: 0, upd: 8;
c1zv:
    I64[Sp - 16] = stg_bh_upd_frame_info;
    I64[Sp - 8] = _c1zu::I64;
    R2 = c1zx_str;
    Sp = Sp - 16;
    call GHC.CString.unpackCString#_info(R2) args: 24, res: 0, upd: 24;

Notice how correct version saves R1 to local variable. I'm especially worried about this call:

call (I64[_r1xg::P64])() args: 8, res: 0, upd: 8; CORRECT.CMM
call (I64[R1])() args: 8, res: 0, upd: 8;         WRONG.CMM

If R1 gets clobbered be earlier ccall to newCAF(BaseReg, R1) then this is probably the reason why things go wrong. In that case the right solution would be to tell GHC that the call to newCAF defines register R1 (see DefinerOfRegs instance declaration for GlobalReg). Then this case should be caught by first guard in conflicts. Also, Note [Sinking and calls] seems very relevant here.

As a side note: it is really annoying to see these R1 = R1 assignments. I recall they are eliminated by the code generator but it is frustrating to see them at the Cmm level. I believe the sinking pass should eliminate these assignments but I didn't have enough time to investigate into this further during my internship.

comment:14 in reply to: ↑ 12 ; follow-up: Changed 18 months ago by jstolarek

Replying to awson:

@simonmar, unreverting isTrivial brokes things again.

So this works:

isTrivial :: CmmExpr -> Bool 
isTrivial (CmmReg _) = True
isTrivial _          = False

but this doesn't:

isTrivial :: CmmExpr -> Bool 
isTrivial (CmmReg (CmmLocal _)) = True
isTrivial _                     = False

? I know this is slightly different from your patch (it ignores literals which I believe are irrelevant here). Again, would be nice to see differences in Cmm between the two.

simonmar: could you tell us whether the call to newCAF clobbers R1 ?

comment:15 in reply to: ↑ 14 Changed 18 months ago by awson

Replying to jstolarek:

So this works:

isTrivial :: CmmExpr -> Bool 
isTrivial (CmmReg _) = True
isTrivial _          = False

but this doesn't:

isTrivial :: CmmExpr -> Bool 
isTrivial (CmmReg (CmmLocal _)) = True
isTrivial _                     = False

Conversely. The first does not work, the second works.

Sadly, I've already removed this wrong build. I don't think I want to make it today again, sorry.

Last edited 18 months ago by awson (previous) (diff)

comment:16 Changed 18 months ago by jstolarek

Conversely. The first does not work, the second works.

Yes, you're right. It should be other way around.

comment:17 follow-up: Changed 18 months ago by simonmar

I think the bad code is much more likely to be in a library somewhere, not in the code that calls readFile/lines, so we're not going to see anything bad in that file.

@jstolarek - R1 is not caller-saves (on either Linux or Win64), so it is safe to leave it live across a foreign call. Also, that code fragment is the standard newCAF call which appears everywhere, if this was broken then everything would be broken.

I'm surprised that the change to isTrivial all by itself causes this failure. That must mean that there was something wrong with the code before the "bad patch", because the change to isTrivial should be safe.

comment:18 Changed 18 months ago by simonmar

It's looking more likely that there is a bug related to caller-saves registers somewhere else, and the change to isTrivial tickles it by making CmmSink more eager to inline GlobalRegs. This is affecting Win64 because that platform has a different C calling convention, with different caller-saves registers.

So either we have a bad optimisation (probably in CmmSink), or the specification for what is a caller-saves reg on Win64 is wrong (includes/stg/MachRegs.h).

comment:19 in reply to: ↑ 17 Changed 18 months ago by awson

Replying to simonmar:

I'm surprised that the change to isTrivial all by itself causes this failure. That must mean that there was something wrong with the code before the "bad patch", because the change to isTrivial should be safe.

It probably would worth to add that reverting isTrivial only does not help either. I. e. isTrivial all by itself broke things as they were before that bad commit, but changing *only* isTrivial to it's pre-commit state does not help either.

comment:20 follow-up: Changed 18 months ago by simonpj

Could this have anything to do with #8870, a much simpler case than Cabal?

(As #8870 says, I can't even build a working stage-2 GHC on Windows now.)

Both this and #8870 look like release blockers to me.

Simon

comment:21 in reply to: ↑ 20 Changed 18 months ago by awson

Replying to simonpj:

Could this have anything to do with #8870, a much simpler case than Cabal?

(As #8870 says, I can't even build a working stage-2 GHC on Windows now.)

Both this and #8870 look like release blockers to me.

Simon

I can't reproduce #8870 using 32-bit GHC on 64-bin Windows. Are you using 32-bit Windows? If no, then your problem can be different from #8870 (but still related to #8834).

comment:22 Changed 18 months ago by jstolarek

Bug report says:

During debugging I see segfault occurs inside evacuate somewhere near get_itbl call I guess.

Can someone tell me what is evacuate and get_itbl? Are these functions in the RTS?

comment:23 Changed 18 months ago by awson

Yes, they are in RTS. More precise location is rts/sm/Evac.c line 390.

Also I think this bug is triggered near foreign memchr call in function elemIndex from Data.ByteString module in bytestring package, which is inlined down to lines function from Data.ByteString.Char8.

Also I checked this bug is not triggered if relevant modules from bytestring package are compiled with -O flag instead of -O2, hence this bug could be more ubiquitous if more applications code was compiled with -O2, perhaps.

comment:24 follow-up: Changed 18 months ago by jstolarek

Also I checked this bug is not triggered if relevant modules from bytestring package are compiled with -O flag instead of -O2

That's interesting. Looking at DynFlags.lhs I see two optimisations that are enabled only with -O2: liberate case and SpecConstr. I admit have no idea what they do. Suggestions please?

Also I think this bug is triggered near foreign memchr call in function elemIndex from Data.ByteString module in bytestring package, which is inlined down to lines function from Data.ByteString.Char8.

It would be great if we had a test case that does not depend on any library code. This way we could eyeball the problem by looking at Cmm. Do you think you would be able to create such a test case.

I got my hands on 64-bit Windows, I'm building GHC at the moment so I'll try to look into this one.

comment:25 Changed 18 months ago by thoughtpolice

It's not true that those are the *only* things enabled by -O2 - you must also search for optLevel, which client code can depend on for specific instances if they wish (for example, maybe it's *not* an entire Core->Core pass, but an otherwise small micro-optimization).

Actually, now that I'm searching and thinking about it - the only other case where we do this is when we short-cut PAPs - see 4d1ea482885481073d2fee0ea0355848b9d853a1 and Note [avoid intermediate PAPs] in StgCmmLayout. Simon committed this a while ago.

I also have a Win32 build going, so I'll test this.

comment:26 Changed 18 months ago by thoughtpolice

(To be clear: by 'test this' I mean reverting that).

comment:27 in reply to: ↑ 24 Changed 18 months ago by awson

Replying to jstolarek:

It would be great if we had a test case that does not depend on any library code. This way we could eyeball the problem by looking at Cmm. Do you think you would be able to create such a test case.

I think, this code should be enough:

module BugIso (lines1) where

import Prelude hiding (null, take, drop)
import Data.ByteString hiding (elemIndex)
import Data.ByteString.Internal
import Data.Word

import Foreign

elemIndex1 :: Word8 -> ByteString -> Maybe Int
elemIndex1 c (PS x s l) = inlinePerformIO $ withForeignPtr x $ \p -> do
    let p' = p `plusPtr` s
    q <- memchr p' c (fromIntegral l)
    return $! if q == nullPtr then Nothing else Just $! q `minusPtr` p'
{-# INLINE elemIndex1 #-}

elemIndex :: Char -> ByteString -> Maybe Int
elemIndex = elemIndex1 . c2w
{-# INLINE elemIndex #-}

lines1 :: ByteString -> [ByteString]
lines1 ps
    | null ps = []
    | otherwise = case search ps of
             Nothing -> [ps]
             Just n  -> take n ps : lines1 (drop (n+1) ps)
    where search = elemIndex '\n'

comment:28 Changed 18 months ago by jstolarek

It's not true that those are the *only* things enabled by -O2

Just to be clear, I only said they are the only -O2 things enabled in DynFlags.

I think, this code should be enough:

Yes, I can confirm that it segfaults.

comment:29 Changed 18 months ago by thoughtpolice

I can confirm Kyril's example above works properly and produces the segfault as well. Here's my updated version:

module Main (main) where

import Prelude hiding (null, take, drop)
import Data.ByteString hiding (elemIndex)
import qualified Data.ByteString as B 
import Data.ByteString.Internal
import Data.Word

import Foreign

elemIndex1 :: Word8 -> ByteString -> Maybe Int
elemIndex1 c (PS x s l) = inlinePerformIO $ withForeignPtr x $ \p -> do
    let p' = p `plusPtr` s
    q <- memchr p' c (fromIntegral l)
    return $! if q == nullPtr then Nothing else Just $! q `minusPtr` p'
{-# INLINE elemIndex1 #-}

elemIndex :: Char -> ByteString -> Maybe Int
elemIndex = elemIndex1 . c2w
{-# INLINE elemIndex #-}

lines1 :: ByteString -> [ByteString]
lines1 ps
    | null ps = []
    | otherwise = case search ps of
             Nothing -> [ps]
             Just n  -> take n ps : lines1 (drop (n+1) ps)
    where search = elemIndex '\n'


main = do
  f <- B.readFile "00-index.cache"
  print (Prelude.length $ lines1 f)

You can grab the 00-index.cache file from /c/Users/$USER/AppData/Roaming/cabal/packages/hackage.haskell.org/00-index.cache

I'm investigating reverting the PAP patch (still building).

comment:30 follow-up: Changed 18 months ago by jstolarek

I applied awson's patch that reverts some of my changes in CmmSink and surprisingly I'm still getting a segfault with code posted in comment #27.

comment:31 in reply to: ↑ 30 Changed 18 months ago by awson

Replying to jstolarek:

I applied awson's patch that reverts some of my changes in CmmSink and surprisingly I'm still getting a segfault with code posted in comment #27.

Very strange. I have no segfault here with patched HEAD.

comment:32 Changed 18 months ago by simonmar

Could someone post Cmm for the working and non-working versions please?

Changed 18 months ago by awson

Changed 18 months ago by awson

comment:33 Changed 18 months ago by awson

These are BugIso module versions, compiled with -O2 -fmax-simplifier-iterations=10 -fdicts-cheap -fspec-constr-count=6 (bytestring package flags) by 7.8rc2 (bad) and patched HEAD (good).

Last edited 18 months ago by awson (previous) (diff)

comment:34 Changed 18 months ago by jstolarek

Could you also post cmm dumps for -O1 and unpatched GHC?

Changed 18 months ago by awson

BugIso compiled by 7.8rc2 with -O

comment:35 Changed 18 months ago by simonmar

Here is the broken bit of code, from lines1_bad:

  c2gC:
      _s2cV::I64 = R5;
      _s2cY::I64 = R2 + R4;
      _c2f5::I64 = R5;
      (_s2d3::I64) = call "ccall" arg hints:  [PtrHint,
                                               `signed',]  result hints:  [PtrHint] memchr(_s2cY::I64, 10, _c2f5::I64);
      if (_s2d3::I64 == 0) goto c2gK; else goto c2gL;
  c2gK:
      call MO_Touch(R3);
      I64[Hp - 128] = Data.ByteString.Internal.PS_con_info;
      P64[Hp - 120] = R3;
      I64[Hp - 112] = R2;
      I64[Hp - 104] = R4;
      I64[Hp - 96] = _s2cV::I64;
      I64[Hp - 88] = :_con_info;
      P64[Hp - 80] = Hp - 127;
      P64[Hp - 72] = GHC.Types.[]_closure+1;
      _c2gw::P64 = Hp - 86;
      Hp = Hp - 72;
      R1 = _c2gw::P64;
      call (P64[Sp])(R1) args: 8, res: 0, upd: 8;

Note how R2, R3 and R4 are live across the C call. This is wrong, because on Win64, R3 and R4 are caller-saves and therefore clobbered by the C call.

comment:36 follow-up: Changed 18 months ago by jstolarek

Note how R2, R3 and R4 are live across the C call. This is wrong, because on Win64, R3 and R4 are caller-saves and therefore clobbered by the C call.

Just to clarify: floating of R2 is correct? MachRegs.h defines R3 and R4 correctly as caller saves:

#if !defined(mingw32_HOST_OS)
#define CALLER_SAVES_R3
#define CALLER_SAVES_R4
#endif

comment:37 in reply to: ↑ 36 Changed 18 months ago by awson

All looks quite the contrary:

...
#define REG_R2    r14
#define REG_R3    rsi
#define REG_R4    rdi
...

The registers RBX, RBP, RDI, RSI, RSP, R12, R13, R14, and R15 are considered nonvolatile...

And we have *not defined* here:

#if !defined(mingw32_HOST_OS)
#define CALLER_SAVES_R3
#define CALLER_SAVES_R4

Thus, the problem is somewhere else.

Last edited 18 months ago by awson (previous) (diff)

comment:38 Changed 18 months ago by simonmar

Ah, you're right, my apologies, I misread the conditional in MachRegs.h.

So the problem is elsewhere... time for gdb?

comment:39 follow-up: Changed 18 months ago by simonpj

So where are we on this?

  • Simon M says "here is the broken bit of code" (comment 35), and then says "I misread" (comment 38). Does that mean that the broken bit of code isn't broken?
  • If we use Karel's reversion patch (between comments 2 and 3 above) does that cure all known crashes? I confirm that it does fix my own "ghc-stage2 segfaults" problem, reported in #8870. But what about the hello-world problem in #8834, and Ganesh's new report in #8890?
  • If reverting the CmmSink change does in fact solve the problem, we should probably go ahead and revert it, and un-block the GHC 7.8 release.
  • But even if we do that we are still stuck with not knowing WHY it solves the problem. Perhaps the patch was fine, but it exposes a problem somewhere else? And we really want the CmmSink improvements. We really need someone to dig into it with GDB. Austin, can you get a Windows box on the network that exhibits the bug, so that Simon M can crank up gdb?

Simon

comment:40 in reply to: ↑ 39 Changed 18 months ago by awson

Replying to simonpj:

  • If we use Karel's reversion patch

I'm Kyrill, not Karel. Extremely sorry for OT.

comment:41 Changed 18 months ago by awson

I've inspected "bad" Cmm a bit and found this:

           I64[Sp - 32] = R2;
           P64[Sp - 24] = R3;
           I64[Sp - 16] = R4;
           I64[Sp - 8] = R5;
           Sp = Sp - 32;
           call (stg_gc_fun)(R1) args: 40, res: 0, upd: 8;
       c2gF:
           if (%MO_S_Le_W64(R5, 0)) goto c2gB; else goto c2gC;

R5 is r8 and is volatile on Win64. I don't see it restored before %MO_S_Le_W64(R5, 0) (honestly, I don't know what %MO_S_Le_W64(R5, 0) is).

Could it be that another patch interacts badly with stg_gc_fun?

Sorry if this is an utter nonsense - I know very little about GHC internals.

Last edited 18 months ago by awson (previous) (diff)

comment:42 Changed 18 months ago by jstolarek

I don't know what %MO_S_Le_W64(R5, 0) is

MO = Machine Operation S = Signed Le = less-equal W64 = 64-bit word

It's a PrimOp (ie. a built-in machine operation) that checks whether R5 is less or equal to 0.

Could it be that ​another patch interacts badly with stg_gc_fun?

I believe this patch should be irrelevant.

comment:43 Changed 18 months ago by awson

But, AFAIUI, R5 shall be restored anyway after calling stg_gc_fun.

comment:44 Changed 18 months ago by simonmar

Simon M says "here is the broken bit of code" (comment 35), and then says "I misread" (comment 38). Does that mean that the broken bit of code isn't broken?

Correct, we still don't know what's broken.

If we use Karel's reversion patch (between comments 2 and 3 above) does that cure all known crashes? I confirm that it does fix my own "ghc-stage2 segfaults" problem, reported in #8870. But what about the hello-world problem in #8834, and Ganesh's new report in #8890?

If reverting the CmmSink change does in fact solve the problem, we should probably go ahead and revert it, and un-block the GHC 7.8 release.

But even if we do that we are still stuck with not knowing WHY it solves the problem. Perhaps the patch was fine, but it exposes a problem somewhere else? And we really want the CmmSink improvements. We really need someone to dig into it with GDB. Austin, can you get a Windows box on the network that exhibits the bug, so that Simon M can crank up gdb?

I doubt that the patch actually introduced the bug, since @awson said that just the single change to isTrivial is enough to trigger the crash. Still, this patch isn't really essential - it made it possible to run CmmSink before stack layout, but measurements showed that it didn't buy anything to do that, so we're not currently using that functionality. The other thing in the patch is the isTrivial change that makes it more keen to inline GlobalRegs and literals; this is a very small win (<1%, IIRC).

comment:45 Changed 18 months ago by awson

Corresponding code from "good" cmm is:

           I64[Sp - 32] = _s2d4::I64;
           P64[Sp - 24] = _s2d5::P64;
           I64[Sp - 16] = _s2d6::I64;
           I64[Sp - 8] = _s2d7::I64;
           Sp = Sp - 32;
           call (stg_gc_fun)(R1) args: 40, res: 0, upd: 8;
       c2gR:
           if (%MO_S_Le_W64(_s2d7::I64, 0)) goto c2gN; else goto c2gO;

Not that it restores R5 - it use no R5 at all.

comment:46 Changed 18 months ago by simonmar

@awson - the call to stg_gc_fun doesn't return (there's no "returns to" annotation on it).

comment:47 Changed 18 months ago by awson

Ah, thanks, assembler intuition was wrong for Cmm :)

comment:48 Changed 18 months ago by jstolarek

  • Owner set to jstolarek

Looking into this at the moment. I have a few interesting findings, will post later.

comment:49 Changed 18 months ago by Simon Peyton Jones <simonpj@…>

In a79613a75c7da0d3d225850382f0f578a07113b5/ghc:

Revert ad15c2, which causes Windows seg-faults (Trac #8834)

We don't yet understand WHY commit ad15c2, which is to do with
CmmSink, causes seg-faults on Windows, but it certainly seems to.  So
reverting it is a stop-gap, but we need to un-block the 7.8 release.

Many thanks to awson for identifying the offending commit.

comment:50 follow-up: Changed 18 months ago by jstolarek

Simon, could we get a day or two more for this one? I'm at the very moment writing a summary of what I've learned so far and I believe it will get us very close to fixing this bug properly.

comment:51 Changed 18 months ago by simonpj

I don't think Austin is planning to make the final release in the next day or two, so yes, please go ahead. My reversion was just to un-glue the Windows builds which otherwise are totally stuck.

Simon

comment:52 in reply to: ↑ 50 ; follow-up: Changed 18 months ago by jstolarek

Replying to jstolarek:

Simon, could we get a day or two more for this one? I'm at the very moment writing a summary of what I've learned so far and I believe it will get us very close to fixing this bug properly.

According to my experiments you can fix Windows build easily by changing definition of isTrivial in CmmSink to:

isTrivial :: CmmExpr -> Bool
isTrivial (CmmReg (CmmLocal _)) = True
isTrivial (CmmLit _) = True
isTrivial _          = False

comment:53 Changed 18 months ago by simonpj

Fine. But reversion works fine too, right? I'm not suggesting abandoning the CmmSink improvements, once you figure out what is going on.

If meanwhile you want to re-apply, and make the smaller change you propose, that's fine with me

Simon

comment:54 in reply to: ↑ 52 ; follow-up: Changed 18 months ago by awson

Replying to jstolarek:

According to my experiments you can fix Windows build easily by changing definition of isTrivial in CmmSink to:

isTrivial :: CmmExpr -> Bool
isTrivial (CmmReg (CmmLocal _)) = True
isTrivial (CmmLit _) = True
isTrivial _          = False

Hmm, on my experiments changing isTrivial to

isTrivial :: CmmExpr -> Bool
isTrivial (CmmReg (CmmLocal _)) = True
-- isTrivial (CmmLit _) = True
isTrivial _ = False

alone was not enough to fix things - see my comment 19. So doing CmmLit branch True makes this work?

comment:55 in reply to: ↑ 54 Changed 18 months ago by jstolarek

Here's a quick summary of what I've learned:

  • As we noticed earlier the bug does not happen when only -O1 is used. Enabling -O1 -fspec-constr triggers the bug, but with -O1 -fspec-constr -fno-cmm-sink everything works perfectly fine.
  • I believe that Simon Marlow correctly identified offending piece of Cmm code, although my knowledge of calling conventions doesn't allow me to say why that piece of code is incorrect. Here's how that piece of code looks only with -O1:
c2h9:
    _s2cz::I64 = _s2ct::I64 + _s2cv::I64;
    (_s2cE::I64) = call "ccall" arg hints:  [PtrHint,
                                             `signed',]  result hints:  [PtrHint] memchr(_s2cz::I64, 10, _s2cw::I64);
    if (_s2cE::I64 == 0) goto c2h4; else goto c2h5;
c2h4:
    Hp = Hp - 32;
    _s2cH::P64 = Data.Maybe.Nothing_closure+1;
    goto s2cF;
c2h5:
    I64[Hp - 24] = GHC.Types.I#_con_info;
    I64[Hp - 16] = _s2cE::I64 - _s2cz::I64;
    I64[Hp - 8] = Data.Maybe.Just_con_info;
    P64[Hp] = Hp - 23;
    _s2cH::P64 = Hp - 6;
    goto s2cF;
s2cF:
    call MO_Touch(_s2cu::P64);
    I64[Sp - 40] = c2f7;
    R1 = _s2cH::P64;
    I64[Sp - 32] = _s2ct::I64;
    P64[Sp - 24] = _s2cu::P64;
    I64[Sp - 16] = _s2cv::I64;
    I64[Sp - 8] = _s2cw::I64;
    Sp = Sp - 40;
    if (R1 & 7 != 0) goto c2f7; else goto c2f8;

It looks that sinking is not performed here for some reason that I was unable to figure out. Enabling -O1 -fspec-constr changes the above code to:

c2hi:
    _s2dr::I64 = R5;
    _s2du::I64 = R2 + R4;
    _c2fB::I64 = R5;
    (_s2dz::I64) = call "ccall" arg hints:  [PtrHint,
                                             `signed',]  result hints:  [PtrHint] memchr(_s2du::I64, 10, _c2fB::I64);
    if (_s2dz::I64 == 0) goto c2hd; else goto c2he;
c2hd:
    call MO_Touch(R3);
    I64[Hp - 128] = Data.ByteString.Internal.PS_con_info;
    P64[Hp - 120] = R3;
    I64[Hp - 112] = R2;
    I64[Hp - 104] = R4;
    I64[Hp - 96] = _s2dr::I64;
    I64[Hp - 88] = :_con_info;
    P64[Hp - 80] = Hp - 127;
    P64[Hp - 72] = GHC.Types.[]_closure+1;
    _c2h8::P64 = Hp - 86;
    Hp = Hp - 72;
    R1 = _c2h8::P64;
    call (P64[Sp])(R1) args: 8, res: 0, upd: 8;

Enabling -fspec-constr sinks R2, R3 and R4 past the call to memchr (which BTW. is consistent with awson's earlier observation that segfault is happening somewhere around the call to memchr).

  • Changing definition of isTrivial from:
isTrivial :: CmmExpr -> Bool
isTrivial (CmmReg _) = True
isTrivial (CmmLit _) = True
isTrivial _          = False

to

isTrivial :: CmmExpr -> Bool
isTrivial (CmmReg (CmmLocal _)) = True
isTrivial (CmmLit _) = True
isTrivial _          = False

makes the bug disappear. I believe this is not a proper fix, because it only hides the bug - global registers should be safe to inline! The above Cmm code changes to (-O1 -fspec-constr enabled):

2hi:
   _s2du::I64 = _s2do::I64 + _s2dq::I64;
   (_s2dz::I64) = call "ccall" arg hints:  [PtrHint,
                                            `signed',]  result hints:  [PtrHint] memchr(_s2du::I64, 10, _s2dr::I64);
   if (_s2dz::I64 == 0) goto c2hd; else goto c2he;
2hd:
   call MO_Touch(_s2dp::P64);
   I64[Hp - 128] = Data.ByteString.Internal.PS_con_info;
   P64[Hp - 120] = _s2dp::P64;
   I64[Hp - 112] = _s2do::I64;
   I64[Hp - 104] = _s2dq::I64;
   I64[Hp - 96] = _s2dr::I64;
   I64[Hp - 88] = :_con_info;
   P64[Hp - 80] = Hp - 127;
   P64[Hp - 72] = GHC.Types.[]_closure+1;
   _c2h8::P64 = Hp - 86;
   Hp = Hp - 72;
   R1 = _c2h8::P64;
   call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
  • Since it looks like the problem is caused by sinking registers past the unsafe foreign call to memchr I changed the definition of foreignTargetRegs in the instance definition of DefinerOfRegs in CmmNode (lines 345-346) from:
foreignTargetRegs (ForeignTarget _ (ForeignConvention _ _ _ CmmNeverReturns)) = []
foreignTargetRegs _ = activeCallerSavesRegs

to:

foreignTargetRegs (ForeignTarget _ (ForeignConvention _ _ _ CmmNeverReturns)) = []
foreignTargetRegs _ = activeRegs

This says that any global register can be clobbered by unsafe foreign call, in practice preventing any global register sinking past such calls. After this change the Cmm dump looks like this (again, -O1 -fspec-constr enabled):

c2hi:
    _s2dr::I64 = R5;
    _s2dq::I64 = R4;
    _s2dp::P64 = R3;
    _s2do::I64 = R2;
    _s2du::I64 = R2 + R4;
    _c2fB::I64 = R5;
    (_s2dz::I64) = call "ccall" arg hints:  [PtrHint,
                                             `signed',]  result hints:  [PtrHint] memchr(_s2du::I64, 10, _c2fB::I64);
    if (_s2dz::I64 == 0) goto c2hd; else goto c2he;
c2hd:
    call MO_Touch(_s2dp::P64);
    I64[Hp - 128] = Data.ByteString.Internal.PS_con_info;
    P64[Hp - 120] = _s2dp::P64;
    I64[Hp - 112] = _s2do::I64;
    I64[Hp - 104] = _s2dq::I64;
    I64[Hp - 96] = _s2dr::I64;
    I64[Hp - 88] = :_con_info;
    P64[Hp - 80] = Hp - 127;
    P64[Hp - 72] = GHC.Types.[]_closure+1;
    _c2h8::P64 = Hp - 86;
    Hp = Hp - 72;
    R1 = _c2h8::P64;
    call (P64[Sp])(R1) args: 8, res: 0, upd: 8;

Based on above facts I believe that the problem lies in incorrect definition of caller saves registers. On the other hand I believe we are defining R2, R3 and R4 as callee-saves according to the specification. So there is still possibility that the bug lies somewhere else. At this point I am stuck.

I will attach Cmm dumps in a moment so others can verify my findings. All dumps are for the BugIso module - they don't include boilerplate from Main.

Replying to awson:

Hmm, on my experiments changing isTrivial to

isTrivial :: CmmExpr -> Bool
isTrivial (CmmReg (CmmLocal _)) = True
-- isTrivial (CmmLit _) = True
isTrivial _ = False

alone was not enough to fix things - see my comment 19. So doing CmmLit branch True makes this work?

On my experiments the change of isTrivial that you just posted is enough to prevent the segmentation fault.

Changed 18 months ago by jstolarek

GHC HEAD, -O1 option enabled

Changed 18 months ago by jstolarek

GHC HEAD, -O1 and -fspec-constr options enabled

Changed 18 months ago by jstolarek

isTrivial does not inline global registers, -O1 and -fspec-constr options enabled

Changed 18 months ago by jstolarek

all global registers conflict with unsafe foreigh calls, -O1 and -fspec-constr options enabled

comment:56 follow-up: Changed 18 months ago by simonpj

Interesting; good progress.

  • Can you (by experiment) find which of registers is causing the problem? You fixed it by making them all caller saves, but that might be more than what's needed.
  • You say "It looks that sinking is not performed here for some reason that I was unable to figure out". I suspect it'd really be worth figuring this out. Maybe it's important!
  • You hypothesise that the C procedure memchr is destroying a callee-saves register. Might it be possible to test this hypothesis directly? For example, make the Haskell code call my_memchr instead, and handwrite my_memchr in assembly code, so that it saves all the registers (in a static location), calls the real memchr, and then checks whether the registers have changed. That would absolutely confirm that a claimed callee-saves register is being destroyed.
Last edited 18 months ago by simonpj (previous) (diff)

comment:57 in reply to: ↑ 56 Changed 18 months ago by jstolarek

Replying to simonpj:

  • Can you (by experiment) find which of registers is causing the problem? You fixed it by making them all caller saves, but that might be more than what's needed.
  • You hypothesise that the C procedure memchr is destroying a callee-saves register. Might it be possible to test this hypothesis directly?

I looked at the implementation of memchr in glibc and it looks that on 64 bits it is touching %rsi and %rdi registers (our R3 and R4, respectively). I changed this definition in includes/stg/MachRegs.h:

#if !defined(mingw32_HOST_OS)
#define CALLER_SAVES_R3
#define CALLER_SAVES_R4
#endif

to:

#define CALLER_SAVES_R3
#define CALLER_SAVES_R4

(ie. I removed the conditional) and the segfault has disappeared. Looking at the Cmm confirms that R3 are R4 are now not sunk past the call to memchr:

c2hy:
    _s2dH::I64 = R5;
    _s2dG::I64 = R4;
    _s2dF::P64 = R3;
    _s2dK::I64 = R2 + R4;
    _c2fR::I64 = R5;
    (_s2dP::I64) = call "ccall" arg hints:  [PtrHint,
                                             `signed',]  result hints:  [PtrHint] memchr(_s2dK::I64, 10, _c2fR::I64);
    if (_s2dP::I64 == 0) goto c2ht; else goto c2hu;
c2ht:
    call MO_Touch(_s2dF::P64);
    I64[Hp - 128] = Data.ByteString.Internal.PS_con_info;
    P64[Hp - 120] = _s2dF::P64;
    I64[Hp - 112] = R2;
    I64[Hp - 104] = _s2dG::I64;
    I64[Hp - 96] = _s2dH::I64;
    I64[Hp - 88] = :_con_info;
    P64[Hp - 80] = Hp - 127;
    P64[Hp - 72] = GHC.Types.[]_closure+1;
    _c2ho::P64 = Hp - 86;
    Hp = Hp - 72;
    R1 = _c2ho::P64;

But one thing is not right here:

#if !defined(mingw32_HOST_OS)
#define CALLER_SAVES_R3
#define CALLER_SAVES_R4
#endif

Under 64 bit Windows mingw32_HOST_OS should not be declared and therefore R3 and R4 should be defined as caller saves! As a quick sanity check I wrote this small C program and compiled it with the in-tree mingw gcc:

#include <stdio.h>
#if !defined(mingw32_HOST_OS)
int x = 1;
#else
int x = 2;
#endif

int main() {
printf("%d",x);
}

This program prints 1 as expected. Does anyone have any idea what might be going on here?

comment:58 follow-up: Changed 18 months ago by simonmar

I believe mingw32_HOST_OS is defined on 64-bit Windows. The full platform name is x86_64-unknown-mingw32, i.e. mingw32 is the OS.

You can test this with ghc --info on your 64-bit Windows GHC.

If memchr is clobbering allegedly callee-saves registers, then either memchr is wrong, or our idea of what is callee-saves is wrong. Which is it?

comment:59 Changed 18 months ago by awson

glibc is completely irrelevant here. memchr comes from system dll msvcrt.dll and I doubt it is wrong. So, perhaps, our idea of what is CALLER_SAVES_XX is wrong.

Last edited 18 months ago by awson (previous) (diff)

comment:60 in reply to: ↑ 58 ; follow-up: Changed 18 months ago by jstolarek

Replying to simonmar:

I believe mingw32_HOST_OS is defined on 64-bit Windows.

Doesn't the output from my small C program suggest otherwise? Am I missing something here?

You can test this with ghc --info on your 64-bit Windows GHC.

(...)
 ,("target os","OSMinGW32")
 ,("target arch","ArchX86_64")
(...)
 ,("Build platform","x86_64-unknown-mingw32")
 ,("Host platform","x86_64-unknown-mingw32")
 ,("Target platform","x86_64-unknown-mingw32")
(...)

glibc is completely irrelevant here. memchr comes from system dll msvcrt.dll and I doubt it is wrong.

Hmm... I doubt we can take a look at source of that dll ;-) awson, could you verify that change in MachRegs.h fixes the bug? You might need to recompile GHC from scratch, at least I had to do so because the changes weren't picked up after partial recompilation. Note: don't test that with latest HEAD as it contains Simon's temporary fix based on your earlier patch.

comment:61 in reply to: ↑ 60 ; follow-up: Changed 18 months ago by refold

Replying to jstolarek:

glibc is completely irrelevant here. memchr comes from system dll msvcrt.dll and I doubt it is wrong.

Hmm... I doubt we can take a look at source of that dll ;-)

I believe that Microsoft ships the CRT source with Visual Studio.

comment:62 Changed 18 months ago by simonpj

Austin will try:

  • Re-applying the CmmSink patch
  • Adding R3 and R4 to the caller-saves register for 64-bit

That would be a better fix, because if they really are caller-saves then reverting CmmSink won't necessarily cure everything.

Still needs more investigation for the ultimate cause.

Simon

comment:63 in reply to: ↑ 61 Changed 18 months ago by jstolarek

Replying to refold:

Replying to jstolarek:

glibc is completely irrelevant here. memchr comes from system dll msvcrt.dll and I doubt it is wrong.

Hmm... I doubt we can take a look at source of that dll ;-)

I believe that Microsoft ships the CRT source with Visual Studio.

Great. Does anyone have access to Visual Studio and could check the implementation of memchr?

comment:64 Changed 18 months ago by awson

memchr is implemented in C and I think MS' own C compiler is right. At least memchr extracted from static counterpart of msvcrt is good:

test   %r8,%r8
je     11 <memchr+0x11>
cmp    %dl,(%rcx)
je     11 <memchr+0x11>
inc    %rcx
dec    %r8
jne    5 <memchr+0x5>
neg    %r8
sbb    %rax,%rax
and    %rcx,%rax
retq   

And it touches R5 (r8).

Then this whole R3 R4 story looks very strange for me. Or I misunderstand CALLER_SAVES_XX macros semantics.

Last edited 18 months ago by awson (previous) (diff)

comment:65 Changed 18 months ago by igloo

http://msdn.microsoft.com/en-us/library/6t169e9c.aspx says

The registers RAX, RCX, RDX, R8, R9, R10, R11 are considered volatile and must be considered destroyed on function calls (unless otherwise safety-provable by analysis such as whole program optimization).

The registers RBX, RBP, RDI, RSI, RSP, R12, R13, R14, and R15 are considered nonvolatile and must be saved and restored by a function that uses them.

so the saving info looks right to me.

It wouldn't be too surprising if saving and restoring registers more worked around a register corruption bug.

comment:66 Changed 18 months ago by thoughtpolice

  • Owner jstolarek deleted
  • Status changed from patch to new

comment:67 Changed 18 months ago by thoughtpolice

  • Owner set to jstolarek

comment:68 Changed 18 months ago by thoughtpolice

  • Milestone changed from 7.8.1 to 7.8.2
  • Priority changed from highest to high

This is now fixed in the 7.8 branch, so I'm moving this to 7.8.2 tentatively.

comment:69 Changed 18 months ago by simonmar

Hold the release! I've found the bug. It's in the native codegen. Patch coming soon.

comment:70 Changed 18 months ago by simonmar

  • Milestone changed from 7.8.2 to 7.8.1
  • Priority changed from high to highest

Patch attached. I'll validate it, but I only have one Windows laptop and I'm using it for work, so this might take a while. If someone else can validate more quickly, please go ahead.

comment:71 Changed 18 months ago by simonmar

Oh, and you also have to revert a79613a75c7da0d3d225850382f0f578a07113b5

comment:72 Changed 18 months ago by awson

On the first glance patch looks slightly insufficient because it does not touch 32-bit case. Or did I miss something here? AFAIR, that also was the case of 32-bit GHC segfaults.

Last edited 18 months ago by awson (previous) (diff)

comment:73 Changed 18 months ago by simonmar

Yeah, this is definitely the cause of the crash on 64-bit Windows, so if 32-bit is also crashing then there must be another bug somewhere. @thoughtpolice is going to test again and see if it's still crashing.

comment:74 Changed 18 months ago by jstolarek

  • Owner jstolarek deleted

comment:75 Changed 18 months ago by Austin Seipp <austin@…>

In c4eeacdfdf4578eb6e75bbf2e067bfe70ec94ab0/ghc:

Use the correct callClobberedRegs on Windows/x64 (#8834)

Signed-off-by: Austin Seipp <[email protected]>

comment:76 Changed 18 months ago by thoughtpolice

Simon's patch works on x86_64 windows, with or without SPJ's temporary workaround committed. Cabal works properly. The testsuite looks quite clean modulo some perf numbers. Looks good!

So I think technically this bug is fixed. But there's still an alleged 32bit error somewhere that Ganesh had, and Simon also had. I've had reports of Windows 7 being problematic in particular, so I'm looking to reproduce it right now with my build.

comment:77 Changed 17 months ago by thoughtpolice

I've definitely managed to reproduce this finally after a bunch of hunting - it doesn't really seem related to OS version, JUST related to MSYS2. #8870 is the same thing, I'm quite certain (failure at CPSZ output during segfault if you check -v3). It just doesn't make sense to me why the testsuite runs clean on my machine where I built the bindist, and everything works and compiles, but it fails for users and here.

I'm running the testsuite right now and I can see some isolated failures that I'm quite sure are illegitimate (several segfaults), i'll report the results here, ./validate is about half done.

comment:78 Changed 17 months ago by thoughtpolice

I see no actual suspicious testsuite failures (even in codeGen) that indicate anything is wrong - the only failures I get are due to the compiler itself segfaulting on a test.

More soon.

comment:79 Changed 17 months ago by thoughtpolice

Okay, I spent some time boiling some things down, and I've at least determined the approximate location of the segfault in the code during compilation, which is stmtToInstrs in compiler/nativeGen/X86/CodeGen.hs. Here's just a quick dump (to not loose findings) and I'll keep looking around.

The fault is when compiling System.Time in profiling. Run under gdb:

$ gdb --args "inplace/bin/ghc-stage2.exe" -v3 -hisuf p_hi -osuf  p_o -hcsuf p_hc -static -prof  -H32m -O    -package-name old-time-1.i -ilibraries/old-time/. -ilibraries/old-time/dist-install/build -ilibraries/old-time/dist-install/build/autogen -Ilibraries/old-timearies/old-time/dist-install/build/autogen -Ilibraries/old-time/include    -optP-include -optPlibraries/old-time/dist-install/build/auage base-4.7.0.0 -package old-locale-1.0.0.6 -Wall -XHaskell2010 -O2  -no-user-package-db -rtsopts      -odir libraries/old-time/distaries/old-time/dist-install/build -stubdir libraries/old-time/dist-install/build   -c libraries/old-time/dist-install/build/System/Tie/dist-install/build/System/Time.p_o +RTS -DS
GNU gdb (GDB) 7.6.1
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-msys".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Traceback (most recent call last):
  File "<string>", line 3, in <module>
ImportError: No module named libstdcxx.v6.printers
/etc/gdbinit:6: Error in sourced command file:
Error while executing Python code.
Reading symbols from /home/Administrator/ghc/inplace/bin/ghc-stage2.exe...done.
warning: File "/home/Administrator/ghc/.gdbinit" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
        add-auto-load-safe-path /home/Administrator/ghc/.gdbinit
line to your configuration file "/home/Administrator/.gdbinit".
To completely disable this security protection add
        set auto-load safe-path /
line to your configuration file "/home/Administrator/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
        info "(gdb)Auto-loading safe path"
(gdb) load .gdbinit
You can't do that when your target is `exec'
(gdb) source .gdbinit
(gdb) r
Starting program: /home/Administrator/ghc/inplace/bin/ghc-stage2.exe -v3 -hisuf p_hi -osuf p_o -hcsuf p_hc -static -prof -H32m -O -package-name old-time-1.1.0.2 -hide-all-packages -i -ilibraries/old-time/. -ilibraries/old-time/dist-install/build -ilibraries/old-time/dist-install/build/autogen -Ilibraries/old-time/dist-install/build -Ilibraries/old-time/dist-install/build/autogen -Ilibraries/old-time/include -optP-include -optPlibraries/old-time/dist-install/build/autogen/cabal_macros.h -package base-4.7.0.0 -package old-locale-1.0.0.6 -Wall -XHaskell2010 -O2 -no-user-package-db -rtsopts -odir libraries/old-time/dist-install/build -hidir libraries/old-time/dist-install/build -stubdir libraries/old-time/dist-install/build -c libraries/old-time/dist-install/build/System/Time.hs -o libraries/old-time/dist-install/build/System/Time.p_o +RTS -DS
[New Thread 1136.0xcc8]
         cc8: cap 0: initialised
[New Thread 1136.0x15e8]
[New Thread 1136.0x1658]
[New Thread 1136.0x11b8]
[New Thread 1136.0x11e8]
[New Thread 1136.0x1718]
Glasgow Haskell Compiler, Version 7.9.20140329, stage 2 booted by GHC version 7.6.3
Using binary package database: C:\Users\Administrator\Desktop\msys32\home\Administrator\ghc\inplace\lib\package.conf.d\package.cache
wired-in package ghc-prim mapped to ghc-prim-0.3.1.0-inplace
wired-in package integer-gmp mapped to integer-gmp-0.5.1.0-inplace
wired-in package base mapped to base-4.7.0.0-inplace
wired-in package rts mapped to builtin_rts
wired-in package template-haskell mapped to template-haskell-2.10.0.0-inplace
wired-in package dph-seq not found.
wired-in package dph-par not found.
Hsc static flags:
*** Checking old interface for old-time-1.1.0.2:System.Time:
*** Parser:
*** Renamer/typechecker:
*** Desugar:
Result size of Desugar (after optimization)
  = {terms: 5,701, types: 3,843, coercions: 29}
...
*** Tidy Core:
Result size of Tidy Core
  = {terms: 15,413, types: 10,079, coercions: 582}
Created temporary directory: C:\Users\Administrator\Desktop\msys32\tmp\ghc1136_0
*** CorePrep:
Result size of CorePrep
  = {terms: 18,936, types: 12,028, coercions: 582}
*** Stg2Stg:
*** CodeOutput:
*** New CodeGen:
*** CPSZ:
*** CPSZ:
*** CPSZ:
*** CPSZ:
*** CPSZ:

Program received signal SIGSEGV, Segmentation fault.
0x02137032 in c1hhA_info ()
(gdb) bt
#0  0x02137032 in c1hhA_info ()
Cannot access memory at address 0x28a874
(gdb) disassemble
Dump of assembler code for function c1hhA_info:
   0x02137024 <+0>:     sub    $0x3510,%esp
   0x0213702a <+6>:     mov    0x8(%ebp),%eax
   0x0213702d <+9>:     mov    0x4(%ebp),%ecx
   0x02137030 <+12>:    mov    %esi,%edx
=> 0x02137032 <+14>:    mov    %eax,0x184(%esp)
   0x02137039 <+21>:    mov    -0x1(%edx),%eax
   0x0213703c <+24>:    movzwl -0x2(%eax),%eax
   0x02137040 <+28>:    cmp    $0x1e,%eax
   0x02137043 <+31>:    ja     0x214916f <c1hrv_info+1119>
   0x02137049 <+37>:    mov    %eax,0x190(%esp)
   0x02137050 <+44>:    mov    0x1c(%ebp),%eax
   0x02137053 <+47>:    mov    %eax,0xa0(%esp)
   0x0213705a <+54>:    mov    0x190(%esp),%eax
   0x02137061 <+61>:    jmp    *0x2b86708(,%eax,4)
   0x02137068 <+68>:    inc    %edx
   0x02137069 <+69>:    add    %al,(%eax)
   0x0213706b <+71>:    add    %ah,(%eax)
   0x0213706d <+73>:    add    %al,(%eax)
   0x0213706f <+75>:    add    %al,0x3510ec(%ecx)
End of assembler dump.
(gdb)
(gdb) info registers
eax            0x6b3b05d        112439389
ecx            0x6b49524        112497956
edx            0x67a40a9        108675241
ebx            0x2bc3470        45888624
esp            0x28a874 0x28a874
ebp            0x4697cc8        0x4697cc8
esi            0x67a40a9        108675241
edi            0x6b4ac20        112503840
eip            0x2137032        0x2137032 <c1hhA_info+14>
eflags         0x10202  [ IF RF ]
cs             0x23     35
ss             0x2b     43
ds             0x2b     43
es             0x2b     43
fs             0x53     83
gs             0x2b     43
(gdb) p8 $ebp
0x4697ce4:      0x6b488d9
0x4697ce0:      0x6b4ac18
0x4697cdc:      0x6300ef1e
0x4697cd8:      0x6b4954d
0x4697cd4:      0x67a40a9
0x4697cd0:      0x6b3b05d
0x4697ccc:      0x6b49524
0x4697cc8:      0x2137024 <c1hhA_info>
(gdb) pinfo &c1hhA_info
$1 = {layout = {payload = {ptrs = 903, nptrs = 0}, bitmap = 903, large_bitmap_offset = 903, selector_offset = 903}, type = 32,
  srt_bitmap = 7, code = 0x2137024 <c1hhA_info> "\201\354\020\065"}
(gdb) prinfo &c1hhA_info
$2 = {srt_offset = 8706840, i = {layout = {payload = {ptrs = 903, nptrs = 0}, bitmap = 903, large_bitmap_offset = 903,
      selector_offset = 903}, type = 32, srt_bitmap = 7, code = 0x2137024 <c1hhA_info> "\201\354\020\065"}}

The fault is in c1hhA_info. Finding that symbol:

$ find compiler/stage2 -type f | xargs grep c1hhA_info
Binary file compiler/stage2/build/libHSghc-7.9.20140329.a matches
Binary file compiler/stage2/build/X86/CodeGen.o matches

It's in ./nativeGen/X86/CodeGen.hs - check the -ddump-opt-cmm out to get corresponding optimized code, finding the closure for the info table:

==================== Optimised Cmm ====================
a292_rFAj_entry() //  []
        { [(c18OH,
            block_c18OH_info:
                const u1hJf_srtd-block_c18OH_info;
                const 1;
                const 4294901792;),
           (c18OQ,
            block_c18OQ_info:
                const u1hJg_srtd-block_c18OQ_info;
                const 3;
                const 4294901792;),
...
...
...
           (c1hhA,
            block_c1hhA_info:
                const SUys_srt-block_c1hhA_info+2332;
                const 903;
                const 458784;),
...
...
...
      c1hyQ:
          I32[Hp - 8] = $w$j_sSgL_info;
          I32[Hp - 4] = I32[Sp + 24];
          I32[Hp] = I32[Sp + 20];
          I32[Sp] = block_c1hhA_info;
          R1 = P32[Sp + 12];
          P32[Sp + 24] = Hp - 8;
          if (R1 & 3 != 0) goto c1hhA; else goto c1hhB;
      c1hhB:
          call (I32[R1])(R1) returns to c1hhA, args: 4, res: 4, upd: 4;
      c1hhA:
          _sSgE::P32 = P32[Sp + 8];
          _sSgI::P32 = P32[Sp + 4];
          _sSmu::P32 = R1;
          _c1hub::I32 = %MO_UU_Conv_W16_W32(I16[I32[_sSmu::P32 - 1] - 2]);
          if (_c1hub::I32 > 30) goto c1hug; else goto c1hue;
...
...

We can see this matches pretty closely with the assembly generated around the offending code (-ddump-asm):

_c1hyQ:
        movl $$w$j_sSgL_info,-8(%edi)
        movl 24(%ebp),%eax
        movl %eax,-4(%edi)
        movl 20(%ebp),%eax
        movl %eax,(%edi)
        movl $block_c1hhA_info,(%ebp)
        movl 12(%ebp),%esi
        leal -8(%edi),%eax
        movl %eax,24(%ebp)
        testl $3,%esi
        jne _n1kED
...
.text
        .align 4,0x90
        .long   SUys_srt-(block_c1hhA_info)+2332
        .long   903
        .long   458784
block_c1hhA_info:
_c1hhA:
        subl $13584,%esp
_n1kED:
        movl 8(%ebp),%eax
        movl 4(%ebp),%ecx
        movl %esi,%edx
        movl %eax,388(%esp)
        movl -1(%edx),%eax
        movzwl -2(%eax),%eax
        cmpl $30,%eax
        ja _c1hug

Next, looking at the STG output, a292_rFAj looks like:

a292_rFAj
  :: forall e_a94q x_a94r.
     CmmNode.CmmNode e_a94q x_a94r
     -> NCGMonad.NatM_State
     -> (X86.CodeGen.InstrBlock, NCGMonad.NatM_State)
...

If you search for calls to this, there is one call to it from a290_rFAh, which looks like:

a290_rFAh
  :: CmmMachOp.CallishMachOp
     -> Data.Maybe.Maybe CmmNode.CmmFormal
     -> [CmmNode.CmmActual]
     -> NCGMonad.NatM_State
     -> (X86.CodeGen.InstrBlock, NCGMonad.NatM_State)
...
                let {
                  sat_sMWp [Occ=Once]
                    :: CmmNode.CmmNode Compiler.Hoopl.Block.O Compiler.Hoopl.Block.O
                  [LclId, Str=DmdType] =
                      NO_CCS CmmNode.CmmUnsafeForeignCall! [GHC.Prim.coercionToken#
                                                            GHC.Prim.coercionToken#
                                                            sat_sMWb
                                                            sat_sMWd
                                                            sat_sMWo];
                } in  a292_rFAj sat_sMWp st'_sMWa;

The only thing that has a type of CallishMachOp -> ... is outOfLineCmmOp, which does indeed call stmtToInstrs (CmmUnsafeForeignCall ...) like in the above snippet. The type of stmtToInstrs also matches the (desugared) type of a292_rFAj. So this is certainly where the fault is occurring.

Unfortunately stmtToInstrs becomes incredibly large it seems, so pinning it down further is proving challening.

comment:80 Changed 17 months ago by thoughtpolice

I'll also note that it seems turning off the sinking pass seems to make no difference (that is, -fno-cmm-sink in compiler/nativeGen/X86/CodeGen.hs), although I haven't verified it faults in exactly the same spot.

Last edited 17 months ago by thoughtpolice (previous) (diff)

comment:81 Changed 17 months ago by thoughtpolice

The only way I can get this to *not* segfault is if I completely disable optimization with {-# OPTIONS_GHC -O0 #-} in CodeGen.hs. I'm going to see if the build completes and run the testsuite again to see what it says...

(BTW, this is a typical performance build, so everything will be compiled with -O, at least).

comment:82 Changed 17 months ago by simonmar

   0x02137024 <+0>:     sub    $0x3510,%esp
   0x0213702a <+6>:     mov    0x8(%ebp),%eax
   0x0213702d <+9>:     mov    0x4(%ebp),%ecx
   0x02137030 <+12>:    mov    %esi,%edx
=> 0x02137032 <+14>:    mov    %eax,0x184(%esp)

Oh wow, this function needs a *lot* of spill space on the C stack. I bet the problem is that we're bumping %esp by more than one page, and Windows doesn't like that, it expect the stack to grow by one page at a time. So the fix would be to write to the intervening pages one at a time. This is another bug in the NCG.

I'm also interested in why this function needs quite so much extra stack.

(also, shouldn't we be discussing this on #8870? The bug in this ticket is fixed, I think).

comment:83 Changed 17 months ago by thoughtpolice

  • Resolution set to fixed
  • Status changed from new to closed

Whoops, yes, this is fixed.

comment:84 Changed 17 months ago by awson

Now, when #8870 looks understood can we revert a79613a75c7da0d3d225850382f0f578a07113b5?

comment:85 Changed 17 months ago by Austin Seipp <austin@…>

In c6c86789c95462216a3167d7b98b202a5bf4c0b2/ghc:

Revert "Revert ad15c2, which causes Windows seg-faults (Trac #8834)"

This reverts commit a79613a75c7da0d3d225850382f0f578a07113b5.
Note: See TracTickets for help on using tickets.