Opened 3 years ago

Closed 6 months ago

#8974 closed bug (fixed)

64 bit windows executable built with ghc-7.9.20140405+LLVM segfaults

Reported by: awson Owned by:
Priority: high Milestone: 8.2.1
Component: Compiler (LLVM) Version: 7.9
Keywords: Cc: simonmar, Phyx-
Operating System: Windows Architecture: x86_64 (amd64)
Type of failure: Runtime crash Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D2749
Wiki Page:

Description

-- test.hs
import System.Mem (performMajorGC)

main = performMajorGC >> putStrLn "Done"

built with ghc -pgmlo opt -pgmlc llc -fllvm --make test.hs segfaults both for LLVM 3.4 and 3.5svn (taken from http://sourceforge.net/projects/msys2/files/REPOS/MINGW/x86_64).

32-bit ghc-7.9.20140404+llvm produces good executable.

Adding ArchX86_64 OSMinGW32 case to moduleLayout in compiler/llvmGen/LlvmCodeGen/Ppr.hs improves things slightly (some code segfaulting without it starts to work) but still does not cure the code above.

Also I've found the Cmm produced for LLVM CG differs from that produced for NCG.

Attachments (4)

T8947.ll (19.2 KB) - added by awson 3 years ago.
T8947_LLVMCG_cmm (32.9 KB) - added by awson 3 years ago.
T8947_NCG_cmm (26.6 KB) - added by awson 3 years ago.
ghc-w64-llvm34_v2.patch (4.7 KB) - added by awson 3 years ago.

Download all attachments as: .zip

Change History (61)

comment:1 Changed 3 years ago by carter

CMM is generated *before* the NCG and LLVM backends... so is there some code path before the code gen that depends on which code gen is selected?

comment:2 Changed 3 years ago by ezyang

It would be helpful if you could post the C-- produced.

comment:3 Changed 3 years ago by ezyang

Cc: dterei added

Changed 3 years ago by awson

Attachment: T8947.ll added

Changed 3 years ago by awson

Attachment: T8947_LLVMCG_cmm added

Changed 3 years ago by awson

Attachment: T8947_NCG_cmm added

comment:4 Changed 3 years ago by awson

To make Cmm shorter I've separated segfaulting code from main (it still segfaults when called from main) thus:

-- T8947.hs
module T8947 where

import System.Mem (performMajorGC)

t8947 :: IO ()
t8947 = performMajorGC >> putStrLn "Done"

T8947_LLVMCG_cmm T8947.ll are produced by ghc -O2 -pgmlo opt -pgmlc llc -fllvm -keep-llvm-files -ddump-cmm -c T8947.hs > T8947_LLVMCG_cmm. T8947_NCG_cmm is produced by ghc -O2 -ddump-cmm -c T8947.hs > T8947_NCG_cmm.

comment:5 Changed 3 years ago by awson

Perhaps it would be interesting:

performMajorGC alone and putStrLn "Done" alone works.

putStrLn "Done" >> performMajorGC works.

And

foreign import ccall unsafe puts :: Ptr a -> IO ()

performMajorGC >> puts (Ptr "Done"#)

works too.

comment:6 in reply to:  1 Changed 3 years ago by jstolarek

Replying to carter:

CMM is generated *before* the NCG and LLVM backends... so is there some code path before the code gen that depends on which code gen is selected?

Yes, there is.

comment:8 Changed 3 years ago by awson

I've tried to make LLVM codegen to not trash anything (getTrashRegs = return []) but the problem is still here. Hence either my analysis is wrong or incomplete.

comment:9 Changed 3 years ago by thoughtpolice

Milestone: 7.8.27.8.3

comment:10 Changed 3 years ago by awson

Well, I've found the source of this bug. It turned out, windows does not like 64-bit offsets, perhaps, this is PE32+'s painful legacy.

Here is the difference between segfaulting and working (manually created) code:

--- T8947.s	2014-04-21 14:02:47.240488500 +0400
+++ T8947m.s	2014-04-21 15:22:41.951320900 +0400
@@ -85,7 +85,8 @@
 	.globl	T8947_t1_info_itable    # @T8947_t1_info_itable
 	.align	8
 T8947_t1_info_itable:
-	.quad	S1i6_srt-T8947_t1_info
+	.long	S1i6_srt-T8947_t1_info
+	.long	0
 	.quad	4294967299              # 0x100000003
 	.quad	0                       # 0x0
 	.quad	64424509455             # 0xf0000000f
@@ -145,7 +146,8 @@
 	.text
 	.align	8                       # @c1hV_info_itable
 c1hV_info_itable:
-	.quad	S1i6_srt-c1hV_info
+	.long	S1i6_srt-c1hV_info
+	.long	0
 	.quad	0                       # 0x0
 	.quad	47244640288             # 0xb00000020
 
@@ -167,7 +169,8 @@
 	.globl	T8947_t8947_info_itable # @T8947_t8947_info_itable
 	.align	8
 T8947_t8947_info_itable:
-	.quad	(S1i6_srt-T8947_t8947_info)+16
+	.long	(S1i6_srt-T8947_t8947_info)+16
+	.long	0
 	.quad	4294967299              # 0x100000003
 	.quad	0                       # 0x0
 	.quad	4294967311              # 0x10000000f

Bad data are generated by the following llvm code:

...
@T8947_t1_info_itable = constant %T8947_t1_entry_struct<{i64 add (i64 sub (i64 ptrtoint (i8* @S1i6_srt$alias to i64),i64 ptrtoint (void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* @T8947_t1_info to i64)),i64 0), i64 4294967299, i64 0, i64 64424509455}>, section "X98A__STRIP,__me3", align 8
...
@c1hV_info_itable = internal constant %c1hV_entry_struct<{i64 add (i64 sub (i64 ptrtoint (i8* @S1i6_srt$alias to i64),i64 ptrtoint (void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* @c1hV_info to i64)),i64 0), i64 0, i64 47244640288}>, section "X98A__STRIP,__me5", align 8
...
@T8947_t8947_info_itable = constant %T8947_t8947_entry_struct<{i64 add (i64 sub (i64 ptrtoint (i8* @S1i6_srt$alias to i64),i64 ptrtoint (void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* @T8947_t8947_info to i64)),i64 16), i64 4294967299, i64 0, i64 4294967311}>, section "X98A__STRIP,__me7", align 8
...

But I don't quite understand where in the GHC code shall I intervene precisely to fix it.

Last edited 3 years ago by awson (previous) (diff)

comment:11 Changed 3 years ago by awson

I think that things are pretty much explained in native codegen code.

But AFAIUI, when the relevant LLVM code was written, non-Windows binutils were already improved and that was not taken into account (Windows binutils are not fixable anyway in general).

Then it looks pprInfoTable code is the point we could try to rewrite things at.

Unfortunately, at this point we are forced to "reverse engineer" what was done before, and it is tempting to intervene here, but it seems we can't intervene at this early stage because this can break things in contexts other than pprInfoTable's one.

comment:12 Changed 3 years ago by awson

I've decided that intervening at `genStaticLit (CmmLabelDiffOff l1 l2 off)` is safe and have rewritten the code to generate 32-bit arithmetic and pointer conversion, but it turned out LLVM generates unsuitable code for ptrtoint ... to i32 applied to 64-bit pointer.

For example, if

@T8947_t1_info_itable = constant %T8947_t1_entry_struct<{i64 add (i64 sub (i64 ptrtoint (i8* @S1fL_srt$alias to i64),i64 ptrtoint (void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* @T8947_t1_info to i64)),i64 0), i64 4294967299, i64 0, i64 64424509455}>, section "X98A__STRIP,__me3", align 8

gets rewritten to

@T8947_t1_info_itable = constant %T8947_t1_entry_struct<{i32 add (i32 sub (i32 ptrtoint (i8* @S1i6_srt$alias to i32),i32 ptrtoint (void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* @T8947_t1_info to i32)),i32 0), i32 0, i64 4294967299, i64 0, i64 64424509455}>, section "X98A__STRIP,__me3", align 8

LLVM instead of

T8947_t1_info_itable:
	.quad	S1i6_srt-T8947_t1_info

generates (assembler spits Error: invalid operands (.rdata and *ABS* sections) for `&')

T8947_t1_info_itable:
	.long	(S1i6_srt&-1)-(T8947_t1_info&-1)
	.long	0                       # 0x0

while we want it to be

T8947_t1_info_itable:
	.long	S1i6_srt-T8947_t1_info
	.long	0                       # 0x0

I'm in no way an LLVM expert and know very little about it. Is there a way to make LLVM generate the code we want or are we use the mangler here? Any thoughts?

comment:13 Changed 3 years ago by awson

Ok. I've implemented the mangler based approach.

For what I've tested so far it works.

It's a bit ugly (UUID magic) and fragile (mangler searches and replaces crlf line ending dependent pattern) because I did not bother to elaborate trivial and boring details.

The patch below consists of 2 orthogonal parts:

  • the first introduces target datalayout and target triple for 64-bit mingw32 LLVM, it is compatible with LLVM 3.4 and incompatible with current LLVM 3.5svn (mingw32 was changed to windows-gnu in target triple).
  • the second essentially solves the problem, described in this ticket.

I've implemented all platform-specific code to be selected in runtime (I believe, LLVM can choose a target dynamically, am I wrong?). And I've tested all on 64-bit GHC 7.9+ and MSYS2 built LLVM 3.4 *only*.

comment:14 Changed 3 years ago by awson

Last edited 3 years ago by awson (previous) (diff)

comment:15 Changed 3 years ago by awson

Status: newpatch

Changed 3 years ago by awson

Attachment: ghc-w64-llvm34_v2.patch added

comment:16 Changed 3 years ago by simonmar

Cc: simonmar added

comment:17 in reply to:  12 Changed 3 years ago by bgamari

Replying to awson:

I'm in no way an LLVM expert and know very little about it. Is there a way to make LLVM generate the code we want or are we use the mangler here? Any thoughts?

I'm not sure I understand why LLVM produces the assembler it does in this case. Have you tried bringing this up with the LLVM folks? It may be that the fix belongs in LLVM.

comment:18 Changed 3 years ago by altaic

LLVM has to be able to do a proper pointer cast. I'd definitely take Ben's advise and talk to the LLVM folks about this.

comment:19 Changed 3 years ago by thoughtpolice

Status: patchinfoneeded

comment:20 Changed 3 years ago by thoughtpolice

Milestone: 7.8.37.8.4

Moving to 7.8.4.

comment:21 in reply to:  12 ; Changed 3 years ago by Fanael

Replying to awson:

For example, if

@T8947_t1_info_itable = constant %T8947_t1_entry_struct<{i64 add (i64 sub (i64 ptrtoint (i8* @S1fL_srt$alias to i64),i64 ptrtoint (void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* @T8947_t1_info to i64)),i64 0), i64 4294967299, i64 0, i64 64424509455}>, section "X98A__STRIP,__me3", align 8

gets rewritten to

@T8947_t1_info_itable = constant %T8947_t1_entry_struct<{i32 add (i32 sub (i32 ptrtoint (i8* @S1i6_srt$alias to i32),i32 ptrtoint (void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* @T8947_t1_info to i32)),i32 0), i32 0, i64 4294967299, i64 0, i64 64424509455}>, section "X98A__STRIP,__me3", align 8

LLVM instead of

T8947_t1_info_itable:
	.quad	S1i6_srt-T8947_t1_info

generates (assembler spits Error: invalid operands (.rdata and *ABS* sections) for `&')

T8947_t1_info_itable:
	.long	(S1i6_srt&-1)-(T8947_t1_info&-1)
	.long	0                       # 0x0

while we want it to be

T8947_t1_info_itable:
	.long	S1i6_srt-T8947_t1_info
	.long	0                       # 0x0

I'm in no way an LLVM expert and know very little about it. Is there a way to make LLVM generate the code we want or are we use the mangler here? Any thoughts?

Yes, there is. Use trunc, for example:

%foo = type <{i32, i32}>
@aaa = global i32 5
@bbb = global i32 5
@foo = constant %foo<{i32 trunc(i64 sub(i64 ptrtoint (i32* @aaa to i64), i64 ptrtoint (i32* @bbb to i64)) to i32), i32 0}>

LLVM will generate

foo:
	.long	aaa-bbb
	.long	0                       # 0x0

comment:22 in reply to:  21 ; Changed 3 years ago by awson

Replying to Fanael:

Yes, there is. Use trunc, for example:

%foo = type <{i32, i32}>
@aaa = global i32 5
@bbb = global i32 5
@foo = constant %foo<{i32 trunc(i64 sub(i64 ptrtoint (i32* @aaa to i64), i64 ptrtoint (i32* @bbb to i64)) to i32), i32 0}>

LLVM will generate

foo:
	.long	aaa-bbb
	.long	0                       # 0x0

AFAIR, the trunc alone is not sufficient. What you propose in fact is to declare a pair of 32-bit int variables instead of one 64-bit pointer global variable, right?

comment:23 in reply to:  22 Changed 3 years ago by Fanael

Replying to awson:

AFAIR, the trunc alone is not sufficient. What you propose in fact is to declare a pair of 32-bit int variables instead of one 64-bit pointer global variable, right?

As shown in the example, yes. The ideal solution would be to convert that truncated value back to i64, but even though the LLVM docs say that sext and zext are valid constant expressions, they don't work and yield an "Unsupported expression in static initializer" error.

comment:24 Changed 3 years ago by thoughtpolice

Milestone: 7.8.47.10.1

Moving (in bulk) to 7.10.4

comment:25 Changed 2 years ago by thoughtpolice

Milestone: 7.10.17.12.1

Moving to 7.12.1

comment:26 Changed 2 years ago by bgamari

Do we know exactly why Windows doesn't like 64-bit offsets? Presumably it has some support for this, no?

comment:27 Changed 2 years ago by Fanael

Replying to bgamari:

Do we know exactly why Windows doesn't like 64-bit offsets?

It's not Windows, it's binutils being broken (nothing new). I'm using Git version, so it's no just a problem with 2.22 or whatever's the ancient version of binutils GHC is bundled with.

For example, LLVM emits this:

	.quad	S1kt_srt$def-Main_main1_info$def # @"Main_main1_info$def"
	.quad	4294967299              # 0x100000003
	.quad	0                       # 0x0
	.quad	64424509455             # 0xf0000000f
Main_main1_info$def:

With GDB, we can learn that this value should equal

(gdb) p ((char*)&S1kt_srt$def - (char*)&Main_main1_info$def)
$1 = 3062336

But what actually lands in the executable is

(gdb) x/d Main_main1_info$def - 32
0x4015b0 <Main_main2_info$def+96>:      3062340

Makes me wonder if binutils devs are aware of the problem.

Last edited 2 years ago by Fanael (previous) (diff)

comment:28 Changed 2 years ago by Fanael

comment:29 Changed 22 months ago by bgamari

Fanael, could you test this with ld.gold? We already advise users to use gold on ARM; perhaps we should just do the same here.

comment:30 Changed 22 months ago by Fanael

No, because Windows does not use ELF, so gold, being ELF only, is completely useless there.

Last edited 22 months ago by Fanael (previous) (diff)

comment:31 Changed 21 months ago by thoughtpolice

Milestone: 7.12.18.0.1

Milestone renamed

comment:32 Changed 18 months ago by bgamari

Milestone: 8.0.18.2.1
Status: infoneededupstream

Looks like we we can't do much other than wait for the binutils people.

comment:33 Changed 18 months ago by awson

In fact, GHC HEAD with LLVM 3.7 and current released binutils doesn't have this bug and works without any modifications.

OTOH, 7.10.x with LLVM 3.5 still doesn't work and requires something like the patch I've put here.

Perhaps, we can close this ticket as fixed for GHC 8.

Last edited 18 months ago by awson (previous) (diff)

comment:34 in reply to:  33 Changed 17 months ago by thomie

Milestone: 8.2.18.0.1
Resolution: worksforme
Status: upstreamclosed

Replying to awson:

GHC HEAD with LLVM 3.7 and current released binutils doesn't have this bug and works without any modifications.

Ok, let's close this.

There won't be another 7.10 release afaik.

comment:35 Changed 12 months ago by GordonBGood

Replying to awson: and thomie:

I don't think this should have been closed: Using GCH 64-bit 8.0.1 and the same binutils as always as version 2.25.1 (which is the one that comes with GHC both 7.10.3 and the new one) and LLVM 3.7, I still get segment faults for the compiled executable on Windows 7 64-bit.

I think that just because the simple little test program runs doesn't mean the problem isn't still there, as Fanael: showed that the problem occurs when a page barrier is crossed; The compilation for GHC has changed significantly and it may well be that a little insignificant program no longer crosses a page barrier. My much larger application perhaps does and triggers the same problem.

Last edited 12 months ago by GordonBGood (previous) (diff)

comment:36 Changed 12 months ago by Fanael

Resolution: worksforme
Status: closednew

Replying to awson:

GHC HEAD with LLVM 3.7 and current released binutils doesn't have this bug and works without any modifications.

Precisely *nothing* changed in the code generated by GHC and the binutils bug is still open, so the idea that GHC HEAD doesn't have this bug is [REDACTED].

Last edited 12 months ago by Fanael (previous) (diff)

comment:37 in reply to:  36 Changed 12 months ago by GordonBGood

Replying to Fanael:

Precisely *nothing* changed in the code generated by GHC and the binutils bug is still open, so the idea that GHC HEAD doesn't have this bug is [REDACTED].

Thanks, Fanael, I tried binutils version 2.26.2 to the mix of GHC version 8.0.1 and LLVM 3.7 and there is still no resolution.

comment:38 Changed 12 months ago by YellowOnion

I've been experiencing segfaults with -Odph and -fllvm on ghc 7.10.3 and llvm 3.5 on my app, Is this bug related, or should I file another one?

comment:39 in reply to:  38 Changed 12 months ago by GordonBGood

Replying to YellowOnion:

I've been experiencing segfaults with -Odph and -fllvm on ghc 7.10.3 and llvm 3.5 on my app, Is this bug related, or should I file another one?

If you are on Windows 64 with a 64 bit version of GHC, then it most certainly is.

comment:40 Changed 8 months ago by awson

Since I have now quite a bit of spare time, I've decided to look into this.

Indeed, binutils ld is wrong here. I've created the patch to binutils which fixes things for me.

OTOH, mention should be made that R_X86_64_PC64 reloc is not supported by MS in PE-COFF and neither MS link nor LLVM lld can handle it (the former simply ignores it and the latter complains about unsupported relocation type). Thus the proper way to fix it is to use a workaround similar to that used in NCG (see my comment above) to make LLVM generate R_X86_64_PC32 relocs instead.

comment:41 Changed 8 months ago by awson

Btw, the patch was accepted into mainline binutils, hope the next stable version will contain it.

If anybody is interested in it to be landed in MSYS2 builds ASAP, should appeal to MSYS2 maintainers -- they usually cherry-pick important patches from mainline into their stable builds.

comment:42 Changed 8 months ago by bgamari

Cc: Phyx added

Thanks for looking into this, awson!

Phyx, do you know anyone in the msys2 project? This sounds important.

Last edited 8 months ago by bgamari (previous) (diff)

comment:43 Changed 8 months ago by Phyx-

Cc: Phyx- added; Phyx removed

No sorry, don't know anyone in mingw-w64 yet.

However, @Elieux who is usually in #GHC might be able to help.

In the mean time I have opened an issue on their tracker https://github.com/Alexpux/MINGW-packages/issues/1765 and asked for the patch to be applied.

Since we host the binaries ourselves anyway we can also just choose to apply them ourselves if they don't want to do it.

We have to update binutils anyway for -ffunction-sections no @awson?

comment:44 Changed 8 months ago by dterei

Cc: dterei removed

comment:45 Changed 8 months ago by Phyx-

Pull request has been accepted and merged https://github.com/Alexpux/MINGW-packages/pull/1767 now we just have to wait for a build to be released.

comment:46 Changed 8 months ago by Phyx-

Binutils 2.27-2 has been released on msys, this contains Awson's patch.

comment:47 in reply to:  46 Changed 8 months ago by GordonBGood

Replying to Phyx-:

Binutils 2.27-2 has been released on msys, this contains Awson's patch.

Awson's patch doesn't seem to be enough: Tried Msys2 mingw64/mingw-w64-x86_64-binutils 2.27-2 with LLVM 3.7 and 64-bit GHC 8.0.1 on Windows with a simple Sieve of Eratosthenes Program, which while it compiles using the -fllvm switch still segfaults on execution but does not segfault on execution using no -fllvm switch (defaults to using NCG).

However, Awson's patch does help as other versions of a similar program can be compiled with the same platform and same -fllvm switch to run successfully (and 25% faster than with NCG) whereas they never have before.

I need to boil down a version that consistently fails using -fllvm and not without and will submit it here...

comment:48 in reply to:  46 Changed 8 months ago by GordonBGood

Replying to Phyx-:

Binutils 2.27-2 has been released on msys, this contains Awson's patch.

I need to boil down a version that consistently fails using -fllvm (with the patch on 64-bit Windows) and not without and will submit it here...

Failure code as follows:

{-# LANGUAGE FlexibleContexts #-}
{-# OPTIONS_GHC -O3 -rtsopts #-} -- or O2

import Data.Array.ST (runSTUArray)
import Data.Array.Base

numLOOPS = 48838 :: Integer

-- Uses a very simple Sieve of Eratosthenes to 2 ^ 18 (so one L1 cache size).
-- removed the actual composite number culling code to show the problem in the loop...
test :: () -> [Int]
test() = 2 : [fromIntegral i * 2 + 3 | (i, False) <- assocs bufb] where
 bufb = runSTUArray $ do
  let bfLmt = (256 * 1024) `div` 2 - 1 -- to 2^18 + 2 is 128 KBits - 1 = 16 KBytes
  cmpstsb <- newArray (0, bfLmt) False :: ST s (STUArray s Int Bool)
  let loop n = -- cull a number of times to test timing
        if n <= 0 then return cmpstsb else loop (n - 1)
  loop numLOOPS

main = print $ length $ test()

The above code consistently segfaults with 64-bit GHC 8.0.1 with LLVM 3.7 with the latest MSYS2_64 including the patch on Windows when compiled with the -fllvm flag but not on Linux 64-bit (Fedora 24) under the same conditions or on Windows without the -fllvm flag (defaulting to using NCG).

It does not segfault if 'numLOOPS' is made only 48837 or if the type of 'numLOOPS' is changed from multi-precision 'Integer' to base 'Int' (64-bit integer value for 64-bit systems).

Last edited 8 months ago by GordonBGood (previous) (diff)

comment:49 Changed 8 months ago by awson

Can't reproduce this neither with ghc-8.0.1.20160826+llvm-3.7, nor with ghc-8.1.20160921+llvm-4.0(HEAD). I have no segfaults in both cases (tried to increase numLOOPS to 100000 and 200000 -- no segfaults either).

Perhaps, that was a bug in GHC which was fixed since 8.0.1 release?

comment:50 in reply to:  49 Changed 7 months ago by GordonBGood

Replying to awson:

Can't reproduce this neither with ghc-8.0.1.20160826+llvm-3.7, nor with ghc-8.1.20160921+llvm-4.0(HEAD). I have no segfaults in both cases (tried to increase numLOOPS to 100000 and 200000 -- no segfaults either).

Perhaps, that was a bug in GHC which was fixed since 8.0.1 release?

@awson, Perhaps it has been fixed which would be good - I'm using 64-bit Haskell Platform with stock/stable 8.0.1.

I'm also having segfaults with -fllvm and not without even though I don't believe it's using 'Integer' with the following paged Sieve of Eratosthenes code:

{-# LANGUAGE FlexibleContexts #-}
{-# OPTIONS_GHC -O3 -rtsopts #-} -- or O2

import Data.Bits
import Data.Array.Base
import Data.Array.ST (runSTUArray, STUArray(..))
 
type PrimeType = Int
range = 1000000 :: PrimeType
szPGBTS = (2^14) * 8 :: PrimeType -- CPU L1 cache in bits
szBPBTS = (2^7) * 8 :: PrimeType -- base primes pages can be much smaller
 
primesPages :: PrimeType -> [UArray PrimeType Bool]
primesPages szpgbts = pagesFrom 0 szPGBTS bppgs where
  makePg lowi szbts bps = runSTUArray $ do
    let limi = lowi + szbts - 1
    let nxt = 3 + limi + limi -- last candidate in range
    cmpsts <- newArray (lowi, limi) False
    let pbts = fromIntegral szbts    
    let cull (p:ps) =
          let sqr = p * p in
          if sqr > nxt then return cmpsts
          else let pi = fromIntegral p in
               let cullp c = if c > pbts then return ()
                             else do
                               unsafeWrite cmpsts c True
                               cullp (c + pi) in
               let a = (sqr - 3) `shiftR` 1 in
               let s = if a >= lowi then fromIntegral (a - lowi)
                       else let r = fromIntegral ((lowi - a) `rem` p) in
                            if r == 0 then 0 else pi - r in
               do { cullp s; cull ps }
    if bps == [] then do
      pg0 <- unsafeFreezeSTUArray cmpsts
      cull $ listPagePrms [pg0]
    else cull bps
  pagesFrom lowi bts bps =
    let cf lwi = case makePg lwi bts bps of
          pg -> pg `seq` pg : cf (lwi + bts) in cf lowi
  bppgs =  -- secondary stream of primes
    listPagePrms (makePg 0 szBPBTS [] : (pagesFrom szBPBTS szBPBTS bppgs))

listPagePrms :: [UArray PrimeType Bool] -> [PrimeType]
listPagePrms (hdpg @ (UArray lowi _ rng _) : tlpgs) =
  let loop i = if i >= rng then listPagePrms tlpgs
               else if unsafeAt hdpg i then loop (i + 1)
                    else let ii = lowi + fromIntegral i in
                         case 3 + ii + ii of
                           p -> p `seq` p : loop (i + 1) in loop 0

primesPaged :: () -> [PrimeType]
primesPaged() = 2 : (listPagePrms $ primesPages szPGBTS)

main = print $ length $ takeWhile ((>=) range) $ primesPaged()

The above segfaults for the "range" set to a million, but not for some lesser values (ie. a hundred thousand) for '-fllvm' with the same environment as before.

comment:51 Changed 7 months ago by awson

Weird. No problems with this also. I tried to increase range to 10000000 (10 millions), no problems either.

And yes, I mean 64-bit GHCs only.

comment:52 in reply to:  51 Changed 7 months ago by GordonBGood

Replying to awson:

Weird. No problems with this also. I tried to increase range to 10000000 (10 millions), no problems either.

When it works, the program will output the number of primes up to trillions if you give it enough time, but this simple version works best for ranges up to about 16 billion ;)

And yes, I mean 64-bit GHCs only.

Well, it's not too much of a problem if it is fixed in HEAD as 8.2.1 will take care of it, but we need to verify that 8.0.1 standard has the problem on your Windows machine and I need to verify that HEAD fixes the problem on mine.

Any suggestions on how to get a development release for Windows 64-bit that you show as working without going through hoops to compile it on my machine?

comment:53 Changed 7 months ago by awson

Well, I've downloaded 8.0.1 release, tried it and all your examples immediately started to segfault, but after I replaced distributed binutils with the correct one all started to work flawlessly, no segfaults altogether for all your examples.

comment:54 in reply to:  53 Changed 7 months ago by GordonBGood

Replying to awson:

Well, I've downloaded 8.0.1 release, tried it and all your examples immediately started to segfault, but after I replaced distributed binutils with the correct one all started to work flawlessly, no segfaults altogether for all your examples.

So, Yes, I was unable to compile anything to run at all without segfaults using -fllvm on Windows 64-bit until I updated MSYS2 with pacman update so as to update to binutils 2.27-2, then downgraded LLVM from 3.8 back to 3.7.0-9, which is where I am sitting now with this problem.

I think that the main difference between our setups is that my path finds the MSYS2 bindir first before the mingw that comes with GHC 8.0.1, and perhaps some of those GHC mingw files have been patched or are versions that work with GHC 8.0.1 whereas the up-to-date MSYS2 ones aren't quite compatible. This would explain why 'Integer' has problems as I notice that MSYS2 has updated GMP library files. I have moved the LLVM 3.7.0-9 'llc.exe' and 'opt.exe' files to the GHC mingw bindir, replaced all binutils files in the GHC mingw folder with the new 2.27-2 versions, and temporarily removed MSYS2 bindir from the path, and it works for all of my examples, just as you said.

It seems that this is just another case of incompatibilities between GHC and the versions of programs it requires such as specific versions of LLVM but also others (the 'Integer' GMP problem and likely something to do with allocation as the difference between smaller and larger ranges in the paged SOE is that it needs to use successive pages of culling pages), which hopefully will be fixed with GHC 8.2.1 distributing (hopefully the latest) versions that are known to work and automatically setting the path to use those versions.

Thanks for your patch, which does (finally) allow the use of LLVM with GHC on 64-bit Windows.

Last edited 7 months ago by GordonBGood (previous) (diff)

comment:55 Changed 6 months ago by Phyx-

Differential Rev(s): Phab:D2749
Milestone: 8.0.18.2.1
Status: newpatch

Updating the bindist for 8.2, Didn't have enough testing to include it for 8.0.2.

comment:56 Changed 6 months ago by Ben Gamari <ben@…>

In 20c06143/ghc:

Update Mingw-w64 bindist for Windows

This updates the binary dists for windows to GCC 6.2.0 and
binutils 2.27.2 which has fixes required for LLVM.

Test Plan: ./validate

Reviewers: simonmar, erikd, austin, bgamari

Reviewed By: simonmar, bgamari

Subscribers: thomie, #ghc_windows_task_force

Differential Revision: https://phabricator.haskell.org/D2749

GHC Trac Issues: #12871, #8974

comment:57 Changed 6 months ago by bgamari

Resolution: fixed
Status: patchclosed

This should be fixed with the toolchain bump.

Note: See TracTickets for help on using tickets.