Opened 6 years ago

Closed 5 years ago

#6156 closed bug (fixed)

Optimiser bug on linux-powerpc

Reported by: erikd Owned by:
Priority: high Milestone: 7.6.1
Component: Compiler Version: 7.4.1
Keywords: Cc:
Operating System: Linux Architecture: powerpc
Type of failure: Incorrect result at runtime Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Found a small chunk of code in the cryptocipher package that when compiled and run, produces a difference result when optimised compared to compiling un-optimised.

Note this is only a problem with PowerPC. On x86-64 there is no difference in the output between the optimised version and the un-optimised version.

I have two simple files (Camellia.hs):

module Camellia where

import Data.Bits
import Data.Word

import Debug.Trace

fl :: Word64 -> Word64 -> Word64
fl fin sk =
	let (x1, x2) = w64tow32 fin in
	let (k1, k2) = w64tow32 sk in
	let y2 = x2 `xor` ((x1 .&. k1) `rotateL` 1) in
	let y1 = x1 `xor` (y2 .|. k2) in
	trace (show fin ++ " " ++ show sk ++ " -> " ++ show (w32tow64 (y1, y2))) $ w32tow64 (y1, y2)

w64tow32 :: Word64 -> (Word32, Word32)
w64tow32 w = (fromIntegral (w `shiftR` 32), fromIntegral (w .&. 0xffffffff))

w32tow64 :: (Word32, Word32) -> Word64
w32tow64 (x1, x2) = ((fromIntegral x1) `shiftL` 32) .|. (fromIntegral x2)

and a main program (camellia-test.hs):

import Data.Word
import qualified Camellia as Camellia

a, b :: Word64
a = 1238988323332265734
b = 11185553392205053542

main :: IO ()
main =
    putStrLn $ "Camellia.fl " ++ show a ++ " " ++ show b ++ " -> " ++ show (Camellia.fl a b)

I'm also using this Makefile:

TARGETS = camilla-test-std camilla-test-opt

check : $(TARGETS)
	./camilla-test-std
	./camilla-test-opt

clean :
	make clean-temp-files
	rm -f $(TARGETS)

clean-temp-files :
	rm -f camilla-test.o camilla-test.hi Camellia.o Camellia.hi

camilla-test-opt : camilla-test.hs Camellia.hs
	ghc -Wall -O2 --make -i:Tests $< -o $@
	make clean-temp-files

camilla-test-std : camilla-test.hs Camellia.hs
	ghc -Wall --make -i:Tests $< -o $@
	make clean-temp-files

When I run the two programs I get:

./camilla-test-std
1238988323332265734 11185553392205053542 -> 18360184157246690566
Camellia.fl 1238988323332265734 11185553392205053542 -> 18360184157246690566
./camilla-test-opt
1238988323332265734 11185553392205053542 -> 3698434091925017862
Camellia.fl 38662 15974 -> 3698434091925017862

So there are two problems here:

a) Showing Word64 values is not working correctly in the optimised version.

b) The function Camelia.fl produces the wrong result in the optimised version.

Attachments (1)

0001-Fix-for-optimizer-bug-on-linux-powerpc-6156.patch (1.4 KB) - added by erikd 5 years ago.
Patch to fix this.

Download all attachments as: .zip

Change History (46)

comment:1 Changed 6 years ago by erikd

Just tested this with GHC compiled from git HEAD and the optimised and un-optimised versions give the same results.

comment:2 Changed 6 years ago by erikd

comment:3 Changed 6 years ago by erikd

Hmm, interesting!

The patch from bug #5900 doesn't fix this. Now trying 7.4.2.

comment:4 Changed 6 years ago by erikd

Resolution: fixed
Status: newclosed

Confirmed fixed in ghc 7.4.2.

Closing this bug.

comment:5 Changed 6 years ago by nomeata

Hi,

as Debian cannot upgrade to ghc 7.4.2 at this stage (http://lists.debian.org/debian-haskell/2012/06/msg00038.html) we need to backport the fix to 7.4.1. If it is not the patch from #5900, what else is it?

comment:6 in reply to:  3 Changed 6 years ago by nomeata

Replying to erikd:

Hmm, interesting!

The patch from bug #5900 doesn't fix this. Now trying 7.4.2.

Are you sure it does not fix it? I just read through the diff between 7.4.1 and 7.4.2, and could not find any other related code.

comment:7 Changed 6 years ago by erikd

Not sure what I did wrong the first time, but the fix for #5900 does indeed fix this problem.

I'll update the debian packaging metadata for 7.4.1.

comment:8 in reply to:  7 Changed 6 years ago by nomeata

Replying to erikd:

Not sure what I did wrong the first time, but the fix for #5900 does indeed fix this problem.

I wish you were right, but cryptocipher just failed to build on powerpc, using a patched GHC 7.4.1 that includes your patch (http://anonscm.debian.org/darcs/pkg-haskell/ghc/patches/fix-PPC-right-shift-bug to be precise), here is the build log: https://buildd.debian.org/status/fetch.php?pkg=haskell-cryptocipher&arch=powerpc&ver=0.3.5-1&stamp=1339953314

Any idea?

comment:9 Changed 6 years ago by erikd

Bah!! Failing here as well now.

Let me bash on this some more.

comment:10 Changed 6 years ago by nomeata

Anything new on this front? This is currently preventing Debian wheezy from shipping with yesod.

comment:11 Changed 6 years ago by simonpj

difficulty: Unknown
Resolution: fixed
Status: closednew

At the GHC end I think we are stalled awaiting a reproducible test case. Indeed the ticket is closed Do re-open if anyone can find one.

Mind you, there is no regression test, which is bad. I'll re-open and assign to Paolo to add one. Is that OK Paolo? Need to test both with and without optimisation.

comment:12 Changed 6 years ago by simonpj

Owner: set to pcapriotti

comment:13 Changed 6 years ago by erikd

The two files I posted in the original bug report are the test case.

Should I add the to the test suite and send a patch?

comment:14 Changed 6 years ago by pcapriotti

Milestone: 7.6.1

I added the Camellia example in this ticket as a test case for #5900.

I'll keep this ticket open to track the failure in cryptocipher.

comment:15 Changed 6 years ago by simonpj

I'm confused. Is there a bug in crryptocipher? In that case it doesn't belong on GHC's bug tracker? Or in GHC? In which case how do we reproduce it?

Simon

comment:16 Changed 6 years ago by erikd

@simonpj : This is not a cryptocipher bug, its an GHC optimiser bug which was first found in the cryptocipher package. The code in the original bug report was a snippet of code from cryptocipher. Interestingly, the bug only shows itself on PowerPC.

@pcapriotti : The testsuite file you updated (tests/codeGen/should_run/T5900.hs) probably won't trigger this bug, because in my testing, it would only appear when the function fl appeared in a separate module. Maybe a cross module inlining problem?

Unfortunately I am not in a position to test this at the moment because whenever I try to compile GHC HEAD from git, I hit bug #6167.

comment:17 Changed 6 years ago by simonpj

Thanks. If it was possible to give a reproducible test case (even if power-pc only) that would be v helpful. Sounds as if you are working on #6167; do ask Simon M for help if needed

Simon

comment:18 Changed 6 years ago by pcapriotti

@erikd: Sorry, I haven't actually reproduced the problem here (I don't have access to a ppc machine at the moment). Did I understand correctly that the issue in the Camellia example in this ticket is fixed by the patch in #5900? I'm not sure why splitting it into two modules would make any difference there, though.

comment:19 in reply to:  18 Changed 6 years ago by erikd

@pcapriotti : No this issue is not fixed by the patch in #5900. The Debian Haskell Group hit this issue first with an unpatched (wrt #5900) version of ghc (7.4.1 I believe). I then tested it with 7.4.2 (which is patched wrt #5900) and ghc HEAD at the time. 7.4.2 displayed the problem and ghc HEAD did not. I was not able to figure out why 7.4.2 had a problem and HEAD did not.

comment:20 Changed 6 years ago by erikd

Now that I can compile git HEAD on PowerPC again, I can confirm that this problem is fixed in HEAD, but not fixed in 7.4.2.

I'll see if I can figure out what it is that fixed this so we can patch 7.4.2 in Debian.

comment:21 Changed 6 years ago by erikd

Oops, HEAD is also broken, it just gives a different incorrect result:

./camilla-test-std
Camellia.fl 1238988323332265734 11185553392205053542 -> 18360184157246690566
./camilla-test-opt
Camellia.fl 1238988323332265734 11185553392205053542 -> 3698434091925017862

comment:22 Changed 6 years ago by erikd

Simplified test case for this bug:

import Data.Bits
import Data.Word

w64tow32 :: Word64 -> (Word32, Word32)
w64tow32 w =
    (fromIntegral (w `shiftR` 32), fromIntegral (w .&. 0xffffffff))

main :: IO ()
main =
    if w64tow32 1238988323332265734 == (288474448,3440613126)
        then putStrLn "Pass"
        else putStrLn "Fail"

Compling without optimisation (or with -O0) passes, with -O1 or above it fails.

comment:23 Changed 6 years ago by erikd

I've been playing around with the various optimisation flags that GHC provides and found that my current test fails with an optimisation of -O1 and passes with optimisation of -O1 -fno-enable-rewrite-rules suggesting the problem is likely being triggered by PowerPC specific code generation triggered by something the rewrite rules are doing.

I'm now looking at the various dump outputs.

comment:24 Changed 6 years ago by erikd

Slightly simplified test case (may not be as complete as the original), but which should make debugging easier is:

{-# LANGUAGE MagicHash #-}
import GHC.Base
import GHC.Word
import GHC.IntWord64
import Numeric (showHex)

input :: Word64
input = 1238988323332265734

result :: Word32
result = case input of
	W64# x -> W32# (word64ToWord# x)

main :: IO ()
main = putStrLn $ "Result 0x" ++ showHex result ""

Compiling on PowerPC with -O1 gives the correct result of 0xcd139706 and with -O1 -fno-enable-rewrite-rule gives an incorrect result of 0x9706.

comment:25 Changed 6 years ago by simonpj

Maybe use -ddump-rule-firings to see what rewrites are taking place? I think you are in charge here, erikd, thank you.

Simon

comment:26 Changed 5 years ago by erikd

Have some time to work on this again. The problem revolves around the following function:

word64ToWord32 :: Word64 -> Word32
word64ToWord32 (W64# x) = W32# (word64ToWord# x)

where word64ToWord# is defined in libraries/ghc-prim/GHC/IntWord64.hs as:

foreign import ccall unsafe "hs_word64ToWord" word64ToWord# :: Word64# -> Word#

and hs_word64ToWord is defined libraries/ghc-prim/cbits/longlong.c as:

HsWord   hs_word64ToWord  (HsWord64 w) {return (HsWord)   w;}

comment:27 Changed 5 years ago by erikd

By adding debug printf statements to hs_word64ToWord I was able to determine that hs_word64ToWord was not truncating its input value to the 16 least significant bits, but that the value was already truncated before being passed in.

comment:28 Changed 5 years ago by erikd

The program also gives the different result (optimised vs unoptimised) on powerpc:

import GHC.Word

main :: IO ()
main = putStrLn $ show $ quotRem input 1

input :: Word64
input = 0xa1a2a3a4a5a6a7a8

but gives the correct answer on i386 and x86-64.

In previous tests, the optimised version of the program calls quotRem and the unoptimised version does not.

comment:29 Changed 5 years ago by erikd

Another example program which produces different results optimised vs un-optimised:

import GHC.Word

main :: IO ()
main = print (input, input + 1)

input :: Word64
input = 0xa1a2a3a4a5a6a7a8

In this case word64ToWord does not get called and neither does quotRem.

comment:30 Changed 5 years ago by simonpj

Owner: changed from pcapriotti to igloo

Ian will work with erikd to figure out what is happening here.

comment:31 Changed 5 years ago by erikd

I'll continuue gathering more data. Latest point of interest is this code:

import GHC.Word
import Numeric
main :: IO ()
main = putStrLn (showHex (succ (0xa1a2a3a4ffffffff :: Word64)) "")

Compiled with -O1 -fno-enable-rewrite-rules this produces th correct result of a1a2a3a500000000, but just a plain -O1 this results in:

Enum.succ{Word64}: tried to take `succ' of maxBound

which is rather weird. Its complaining about the least significant Word32 of the Word64 being equal to maxBound (which I'm assuming is the 32 bit maxBound). Something there seems very wrong.

I've been looking at a lot of the dump files as well. In the case of the code above, I've been looking for invocations of succ and get for instance this in dump-prep:

Main.main_w :: GHC.Word.Word64
[GblId, Caf=NoCafRefs, Unf=OtherCon []]
Main.main_w = GHC.Word.W64# (__word64 16)

Main.main3 :: GHC.Word.Word64
[GblId]
Main.main3 = GHC.Word.$w$csucc (__word64 11647051515398455295)

Main.main2 :: GHC.Base.String
[GblId]
Main.main2 =
  Numeric.$wshowIntAtBase
    @ GHC.Word.Word64
    GHC.Word.$fRealWord64
    GHC.Word.$fIntegralWord64_$cquotRem
    GHC.Word.$fEnumWord64_$ctoInteger
    GHC.Word.$fShowWord64
    Main.main_w
    GHC.Show.intToDigit
    Main.main3
    (GHC.Types.[] @ GHC.Types.Char)

Debugging continues.

comment:32 Changed 5 years ago by erikd

Similar to the case of succ in the previous comment pred also misbehaves. This:

putStrLn (showHex (pred (0xa1a2a3a500000000 :: Word64)) "")

results in:

Enum.pred{Word64}: tried to take `pred' of minBound

comment:33 Changed 5 years ago by erikd

Pulling succ out of the Enum typeclass into a standalone program we get:

import GHC.Word
import Numeric

main :: IO ()
main = putStrLn (showHex (succWord64 (0xa1a2a3a3ffffffff :: Word64)) "")

succWord64 :: Word64 -> Word64
succWord64 x =
    if x /= (0xffffffffffffffff::Word64)
        then x + 1
        else error "succWord64"

It still works correctly without optimisation and fails with it.

comment:34 Changed 5 years ago by erikd

New example:

import GHC.Word
import Numeric

main :: IO ()
main = putStrLn (showHex (incr (0xa1a2a3a4a5a6a7a8 :: Word64)) "")

incr :: Word64 -> Word64
incr x = x + 1

results in the following optimised C-- code:

    c1PM:
        I32[Sp - 8] = stg_bh_upd_frame_info;
        I32[Sp - 4] = Hp - 4;
        I64[Sp - 16] = 1 :: W64;
        I64[Sp - 24] = 11647051513882650536 :: W64;
        Sp = Sp - 24;
        jump GHC.Word.$w$c+_info; // []

which gets converted to the following assembler:

_c1PM:
	lis	31, stg_bh_upd_frame_info@ha
	addi	31, 31, stg_bh_upd_frame_info@l
	stw	31, -8(22)
	addi	31, 25, -4
	stw	31, -4(22)
	lis	31, 0
	ori	31, 31, 1
	lis	30, 0
	ori	31, 31, 0
	stw	31, -12(22)
	stw	30, -16(22)
	lis	31, 0
	ori	31, 31, 42920
	lis	30, 0
	ori	31, 31, 0
	stw	31, -20(22)
	stw	30, -24(22)
	addi	22, 22, -24
	b	GHC.Word.$w$c+_info

The assembler looks correct suggesting that the only thing that could be going wrong is the function GHC.Word.$w$c+_info.

comment:35 Changed 5 years ago by erikd

Digging deeper:

{-# LANGUAGE MagicHash #-}
import GHC.Int
import GHC.Word
import GHC.IntWord64
import Numeric (showHex)

main :: IO ()
main = putStrLn (showHex (wordToInt64 input) "")

input :: Word64
input = 0x1a2a3a4a5a6a7a8a

wordToInt64 :: Word64 -> Int64
wordToInt64 (W64# x) = I64# (word64ToInt64# x)

also gives an incorrect result when optimisation is on. The word64ToInt64# is foreign imported as:

foreign import ccall unsafe "hs_word64ToInt64" word64ToInt64# :: Word64# -> Int64#

and hs_word64ToInt64 is defined in C as :

HsInt64  hs_word64ToInt64 (HsWord64 w) {return (HsInt64)  w;}

and testing that function in isolation in a small C program shows no problem.

comment:36 Changed 5 years ago by erikd

The CMM code generated on powerpc is identical to the CMM code generated on i386.

That suggests that the problem is the powerpc specific CMM -> ASM stage.

comment:37 Changed 5 years ago by simonmar

You could make the program even simpler by passing the result to a foreign-imported C function to print it out, instead of using showHex. Then you would be using no library code at all, and you should be able to trace through the assembly to figure out where something has gone wrong (unfortunatey I don't read PPC assembler so I'm no use here).

comment:38 Changed 5 years ago by erikd

Thanks @simonmar, thats a good tip. Using the FFI:

import Foreign.C.Types
import Numeric

foreign import ccall "c_printInt64" printInt64 :: CLLong -> IO ()

main :: IO ()
main = do
    printInt64 input
    putStrLn (showHex input "")

input :: CLLong
input = 0x1a2a3a4a5a6a7a8a

and compiled without optimisation I get:

1a2a3a4a5a6a7a8a 
1a2a3a4a5a6a7a8a

and with optimisation I get:

7a8a 
1a2a3a4a5a6a7a8a

which is weird in that the FFI version is incorrect, but only when optimisation is off.

Modifying this example as:

import Foreign.C.Types
import Numeric

foreign import ccall "c_printInt64" printInt64 :: CLLong -> IO ()

main :: IO ()
main = do
    printInt64 (succ input)
    putStrLn (showHex (succ input) "")

input :: CLLong
input = 0x1a2a3a4a5a6a7a8a

and then both the FFI and the pure Haskell version are incorrect with optimisation.

This suggests that its actually a problem with the way 64 bit values are passed to functions.

comment:39 in reply to:  38 ; Changed 5 years ago by simonmar

Replying to erikd:

foreign import ccall "c_printInt64" printInt64 :: CLLong -> IO ()

Add "unsafe" here to make the generated code simpler.

main :: IO () main = do

printInt64 input putStrLn (showHex input "")

suggest getting rid of the putStrLn, again to make things simpler.

and compiled without optimisation I get:

1a2a3a4a5a6a7a8a 
1a2a3a4a5a6a7a8a

and with optimisation I get:

7a8a 
1a2a3a4a5a6a7a8a

which is weird in that the FFI version is incorrect, but only when optimisation is off.

Did you mean to say on here?

This suggests that its actually a problem with the way 64 bit values are passed to functions.

Yes, probably.

So now you have a very simple example that generates incorrect code and doesn't call any library functions, the bug must be somewhere in the generated code for this module. Compile it with -S and eyeball the assembly code.

comment:40 in reply to:  39 Changed 5 years ago by erikd

Replying to simonmar:

which is weird in that the FFI version is incorrect, but only when optimisation is off.

Did you mean to say on here?

Yes.

Simplifying the haskell code (adding unsafe as suggested):

import Foreign.C.Types
import Numeric

foreign import ccall unsafe "c_printInt64" printInt64 :: CLLong -> IO ()

main :: IO ()
main = printInt64 input

input :: CLLong
input = 0x1a2a3a4a5a6a7a8a

compiles to the following ASM:

Main.main1_info:
_c1ni:
	lis	31, 0            ; r31 = 0
	ori	31, 31, 31370    ; r31 |= 0x7a8a
	lis	30, 0            ; r30 = 0
	ori	31, 31, 0        ; r30 |= 0
	mr	3, 30            ; r3 = r30
	mr	4, 31            ; r4 = r31
	bl	c_printInt64
	lis	14, GHC.Tuple.()_closure+1@ha
	addi	14, 14, GHC.Tuple.()_closure+1@l
	lwz	31, 0(22)
	mtctr	31
	bctr

Obviously the generated code is wrong. Looking at the PPC code generator now.

Changed 5 years ago by erikd

Patch to fix this.

comment:41 Changed 5 years ago by erikd

Status: newpatch

comment:42 Changed 5 years ago by simonmar

Owner: igloo deleted
Priority: normalhigh
Status: patchnew

Nice catch! I'll push and try to get it merged into 7.6.1.

comment:43 Changed 5 years ago by erikd@…

commit b4b78631890a4cd9cde1551de9a4440e7e750372

Author: Erik de Castro Lopo <erikd@mega-nerd.com>
Date:   Thu Aug 23 20:39:47 2012 +1000

    Fix for optimizer bug on linux-powerpc (#6156).

 compiler/nativeGen/PPC/CodeGen.hs |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

comment:44 Changed 5 years ago by simonmar

Status: newmerge

comment:45 Changed 5 years ago by pcapriotti

Resolution: fixed
Status: mergeclosed
Note: See TracTickets for help on using tickets.