Opened 6 years ago
Closed 5 years ago
#6156 closed bug (fixed)
Optimiser bug on linux-powerpc
| Reported by: | erikd | Owned by: | |
|---|---|---|---|
| Priority: | high | Milestone: | 7.6.1 |
| Component: | Compiler | Version: | 7.4.1 |
| Keywords: | Cc: | ||
| Operating System: | Linux | Architecture: | powerpc |
| Type of failure: | Incorrect result at runtime | Test Case: | |
| Blocked By: | Blocking: | ||
| Related Tickets: | Differential Rev(s): | ||
| Wiki Page: |
Description
Found a small chunk of code in the cryptocipher package that when compiled and run, produces a difference result when optimised compared to compiling un-optimised.
Note this is only a problem with PowerPC. On x86-64 there is no difference in the output between the optimised version and the un-optimised version.
I have two simple files (Camellia.hs):
module Camellia where import Data.Bits import Data.Word import Debug.Trace fl :: Word64 -> Word64 -> Word64 fl fin sk = let (x1, x2) = w64tow32 fin in let (k1, k2) = w64tow32 sk in let y2 = x2 `xor` ((x1 .&. k1) `rotateL` 1) in let y1 = x1 `xor` (y2 .|. k2) in trace (show fin ++ " " ++ show sk ++ " -> " ++ show (w32tow64 (y1, y2))) $ w32tow64 (y1, y2) w64tow32 :: Word64 -> (Word32, Word32) w64tow32 w = (fromIntegral (w `shiftR` 32), fromIntegral (w .&. 0xffffffff)) w32tow64 :: (Word32, Word32) -> Word64 w32tow64 (x1, x2) = ((fromIntegral x1) `shiftL` 32) .|. (fromIntegral x2)
and a main program (camellia-test.hs):
import Data.Word
import qualified Camellia as Camellia
a, b :: Word64
a = 1238988323332265734
b = 11185553392205053542
main :: IO ()
main =
putStrLn $ "Camellia.fl " ++ show a ++ " " ++ show b ++ " -> " ++ show (Camellia.fl a b)
I'm also using this Makefile:
TARGETS = camilla-test-std camilla-test-opt check : $(TARGETS) ./camilla-test-std ./camilla-test-opt clean : make clean-temp-files rm -f $(TARGETS) clean-temp-files : rm -f camilla-test.o camilla-test.hi Camellia.o Camellia.hi camilla-test-opt : camilla-test.hs Camellia.hs ghc -Wall -O2 --make -i:Tests $< -o $@ make clean-temp-files camilla-test-std : camilla-test.hs Camellia.hs ghc -Wall --make -i:Tests $< -o $@ make clean-temp-files
When I run the two programs I get:
./camilla-test-std 1238988323332265734 11185553392205053542 -> 18360184157246690566 Camellia.fl 1238988323332265734 11185553392205053542 -> 18360184157246690566 ./camilla-test-opt 1238988323332265734 11185553392205053542 -> 3698434091925017862 Camellia.fl 38662 15974 -> 3698434091925017862
So there are two problems here:
a) Showing Word64 values is not working correctly in the optimised version.
b) The function Camelia.fl produces the wrong result in the optimised version.
Attachments (1)
Change History (46)
comment:1 Changed 6 years ago by
comment:2 Changed 6 years ago by
This was likely fixed in bug #5900 which resulted in this patch:
http://hackage.haskell.org/trac/ghc/changeset/bee6f865d8c747aa821f9d4996ad3300429fd55c
comment:3 follow-up: 6 Changed 6 years ago by
Hmm, interesting!
The patch from bug #5900 doesn't fix this. Now trying 7.4.2.
comment:4 Changed 6 years ago by
| Resolution: | → fixed |
|---|---|
| Status: | new → closed |
Confirmed fixed in ghc 7.4.2.
Closing this bug.
comment:5 Changed 6 years ago by
Hi,
as Debian cannot upgrade to ghc 7.4.2 at this stage (http://lists.debian.org/debian-haskell/2012/06/msg00038.html) we need to backport the fix to 7.4.1. If it is not the patch from #5900, what else is it?
comment:6 Changed 6 years ago by
comment:7 follow-up: 8 Changed 6 years ago by
Not sure what I did wrong the first time, but the fix for #5900 does indeed fix this problem.
I'll update the debian packaging metadata for 7.4.1.
comment:8 Changed 6 years ago by
Replying to erikd:
Not sure what I did wrong the first time, but the fix for #5900 does indeed fix this problem.
I wish you were right, but cryptocipher just failed to build on powerpc, using a patched GHC 7.4.1 that includes your patch (http://anonscm.debian.org/darcs/pkg-haskell/ghc/patches/fix-PPC-right-shift-bug to be precise), here is the build log: https://buildd.debian.org/status/fetch.php?pkg=haskell-cryptocipher&arch=powerpc&ver=0.3.5-1&stamp=1339953314
Any idea?
comment:10 Changed 6 years ago by
Anything new on this front? This is currently preventing Debian wheezy from shipping with yesod.
comment:11 Changed 6 years ago by
| difficulty: | → Unknown |
|---|---|
| Resolution: | fixed |
| Status: | closed → new |
At the GHC end I think we are stalled awaiting a reproducible test case. Indeed the ticket is closed Do re-open if anyone can find one.
Mind you, there is no regression test, which is bad. I'll re-open and assign to Paolo to add one. Is that OK Paolo? Need to test both with and without optimisation.
comment:12 Changed 6 years ago by
| Owner: | set to pcapriotti |
|---|
comment:13 Changed 6 years ago by
The two files I posted in the original bug report are the test case.
Should I add the to the test suite and send a patch?
comment:14 Changed 6 years ago by
| Milestone: | → 7.6.1 |
|---|
I added the Camellia example in this ticket as a test case for #5900.
I'll keep this ticket open to track the failure in cryptocipher.
comment:15 Changed 6 years ago by
I'm confused. Is there a bug in crryptocipher? In that case it doesn't belong on GHC's bug tracker? Or in GHC? In which case how do we reproduce it?
Simon
comment:16 Changed 6 years ago by
@simonpj : This is not a cryptocipher bug, its an GHC optimiser bug which was first found in the cryptocipher package. The code in the original bug report was a snippet of code from cryptocipher. Interestingly, the bug only shows itself on PowerPC.
@pcapriotti : The testsuite file you updated (tests/codeGen/should_run/T5900.hs) probably won't trigger this bug, because in my testing, it would only appear when the function fl appeared in a separate module. Maybe a cross module inlining problem?
Unfortunately I am not in a position to test this at the moment because whenever I try to compile GHC HEAD from git, I hit bug #6167.
comment:17 Changed 6 years ago by
Thanks. If it was possible to give a reproducible test case (even if power-pc only) that would be v helpful. Sounds as if you are working on #6167; do ask Simon M for help if needed
Simon
comment:18 follow-up: 19 Changed 6 years ago by
@erikd: Sorry, I haven't actually reproduced the problem here (I don't have access to a ppc machine at the moment). Did I understand correctly that the issue in the Camellia example in this ticket is fixed by the patch in #5900? I'm not sure why splitting it into two modules would make any difference there, though.
comment:19 Changed 6 years ago by
@pcapriotti : No this issue is not fixed by the patch in #5900. The Debian Haskell Group hit this issue first with an unpatched (wrt #5900) version of ghc (7.4.1 I believe). I then tested it with 7.4.2 (which is patched wrt #5900) and ghc HEAD at the time. 7.4.2 displayed the problem and ghc HEAD did not. I was not able to figure out why 7.4.2 had a problem and HEAD did not.
comment:20 Changed 6 years ago by
Now that I can compile git HEAD on PowerPC again, I can confirm that this problem is fixed in HEAD, but not fixed in 7.4.2.
I'll see if I can figure out what it is that fixed this so we can patch 7.4.2 in Debian.
comment:21 Changed 6 years ago by
Oops, HEAD is also broken, it just gives a different incorrect result:
./camilla-test-std Camellia.fl 1238988323332265734 11185553392205053542 -> 18360184157246690566 ./camilla-test-opt Camellia.fl 1238988323332265734 11185553392205053542 -> 3698434091925017862
comment:22 Changed 6 years ago by
Simplified test case for this bug:
import Data.Bits
import Data.Word
w64tow32 :: Word64 -> (Word32, Word32)
w64tow32 w =
(fromIntegral (w `shiftR` 32), fromIntegral (w .&. 0xffffffff))
main :: IO ()
main =
if w64tow32 1238988323332265734 == (288474448,3440613126)
then putStrLn "Pass"
else putStrLn "Fail"
Compling without optimisation (or with -O0) passes, with -O1 or above it fails.
comment:23 Changed 6 years ago by
I've been playing around with the various optimisation flags that GHC provides and found that my current test fails with an optimisation of -O1 and passes with optimisation of -O1 -fno-enable-rewrite-rules suggesting the problem is likely being triggered by PowerPC specific code generation triggered by something the rewrite rules are doing.
I'm now looking at the various dump outputs.
comment:24 Changed 6 years ago by
Slightly simplified test case (may not be as complete as the original), but which should make debugging easier is:
{-# LANGUAGE MagicHash #-}
import GHC.Base
import GHC.Word
import GHC.IntWord64
import Numeric (showHex)
input :: Word64
input = 1238988323332265734
result :: Word32
result = case input of
W64# x -> W32# (word64ToWord# x)
main :: IO ()
main = putStrLn $ "Result 0x" ++ showHex result ""
Compiling on PowerPC with -O1 gives the correct result of 0xcd139706 and with -O1 -fno-enable-rewrite-rule gives an incorrect result of 0x9706.
comment:25 Changed 6 years ago by
Maybe use -ddump-rule-firings to see what rewrites are taking place? I think you are in charge here, erikd, thank you.
Simon
comment:26 Changed 5 years ago by
Have some time to work on this again. The problem revolves around the following function:
word64ToWord32 :: Word64 -> Word32 word64ToWord32 (W64# x) = W32# (word64ToWord# x)
where word64ToWord# is defined in libraries/ghc-prim/GHC/IntWord64.hs as:
foreign import ccall unsafe "hs_word64ToWord" word64ToWord# :: Word64# -> Word#
and hs_word64ToWord is defined libraries/ghc-prim/cbits/longlong.c as:
HsWord hs_word64ToWord (HsWord64 w) {return (HsWord) w;}
comment:27 Changed 5 years ago by
By adding debug printf statements to hs_word64ToWord I was able to determine that hs_word64ToWord was not truncating its input value to the 16 least significant bits, but that the value was already truncated before being passed in.
comment:28 Changed 5 years ago by
The program also gives the different result (optimised vs unoptimised) on powerpc:
import GHC.Word main :: IO () main = putStrLn $ show $ quotRem input 1 input :: Word64 input = 0xa1a2a3a4a5a6a7a8
but gives the correct answer on i386 and x86-64.
In previous tests, the optimised version of the program calls quotRem and the unoptimised version does not.
comment:29 Changed 5 years ago by
Another example program which produces different results optimised vs un-optimised:
import GHC.Word main :: IO () main = print (input, input + 1) input :: Word64 input = 0xa1a2a3a4a5a6a7a8
In this case word64ToWord does not get called and neither does quotRem.
comment:30 Changed 5 years ago by
| Owner: | changed from pcapriotti to igloo |
|---|
Ian will work with erikd to figure out what is happening here.
comment:31 Changed 5 years ago by
I'll continuue gathering more data. Latest point of interest is this code:
import GHC.Word import Numeric main :: IO () main = putStrLn (showHex (succ (0xa1a2a3a4ffffffff :: Word64)) "")
Compiled with -O1 -fno-enable-rewrite-rules this produces th correct result of a1a2a3a500000000, but just a plain -O1 this results in:
Enum.succ{Word64}: tried to take `succ' of maxBound
which is rather weird. Its complaining about the least significant Word32 of the Word64 being equal to maxBound (which I'm assuming is the 32 bit maxBound). Something there seems very wrong.
I've been looking at a lot of the dump files as well. In the case of the code above, I've been looking for invocations of succ and get for instance this in dump-prep:
Main.main_w :: GHC.Word.Word64
[GblId, Caf=NoCafRefs, Unf=OtherCon []]
Main.main_w = GHC.Word.W64# (__word64 16)
Main.main3 :: GHC.Word.Word64
[GblId]
Main.main3 = GHC.Word.$w$csucc (__word64 11647051515398455295)
Main.main2 :: GHC.Base.String
[GblId]
Main.main2 =
Numeric.$wshowIntAtBase
@ GHC.Word.Word64
GHC.Word.$fRealWord64
GHC.Word.$fIntegralWord64_$cquotRem
GHC.Word.$fEnumWord64_$ctoInteger
GHC.Word.$fShowWord64
Main.main_w
GHC.Show.intToDigit
Main.main3
(GHC.Types.[] @ GHC.Types.Char)
Debugging continues.
comment:32 Changed 5 years ago by
Similar to the case of succ in the previous comment pred also misbehaves. This:
putStrLn (showHex (pred (0xa1a2a3a500000000 :: Word64)) "")
results in:
Enum.pred{Word64}: tried to take `pred' of minBound
comment:33 Changed 5 years ago by
Pulling succ out of the Enum typeclass into a standalone program we get:
import GHC.Word
import Numeric
main :: IO ()
main = putStrLn (showHex (succWord64 (0xa1a2a3a3ffffffff :: Word64)) "")
succWord64 :: Word64 -> Word64
succWord64 x =
if x /= (0xffffffffffffffff::Word64)
then x + 1
else error "succWord64"
It still works correctly without optimisation and fails with it.
comment:34 Changed 5 years ago by
New example:
import GHC.Word import Numeric main :: IO () main = putStrLn (showHex (incr (0xa1a2a3a4a5a6a7a8 :: Word64)) "") incr :: Word64 -> Word64 incr x = x + 1
results in the following optimised C-- code:
c1PM:
I32[Sp - 8] = stg_bh_upd_frame_info;
I32[Sp - 4] = Hp - 4;
I64[Sp - 16] = 1 :: W64;
I64[Sp - 24] = 11647051513882650536 :: W64;
Sp = Sp - 24;
jump GHC.Word.$w$c+_info; // []
which gets converted to the following assembler:
_c1PM: lis 31, stg_bh_upd_frame_info@ha addi 31, 31, stg_bh_upd_frame_info@l stw 31, -8(22) addi 31, 25, -4 stw 31, -4(22) lis 31, 0 ori 31, 31, 1 lis 30, 0 ori 31, 31, 0 stw 31, -12(22) stw 30, -16(22) lis 31, 0 ori 31, 31, 42920 lis 30, 0 ori 31, 31, 0 stw 31, -20(22) stw 30, -24(22) addi 22, 22, -24 b GHC.Word.$w$c+_info
The assembler looks correct suggesting that the only thing that could be going wrong is the function GHC.Word.$w$c+_info.
comment:35 Changed 5 years ago by
Digging deeper:
{-# LANGUAGE MagicHash #-}
import GHC.Int
import GHC.Word
import GHC.IntWord64
import Numeric (showHex)
main :: IO ()
main = putStrLn (showHex (wordToInt64 input) "")
input :: Word64
input = 0x1a2a3a4a5a6a7a8a
wordToInt64 :: Word64 -> Int64
wordToInt64 (W64# x) = I64# (word64ToInt64# x)
also gives an incorrect result when optimisation is on. The word64ToInt64# is foreign imported as:
foreign import ccall unsafe "hs_word64ToInt64" word64ToInt64# :: Word64# -> Int64#
and hs_word64ToInt64 is defined in C as :
HsInt64 hs_word64ToInt64 (HsWord64 w) {return (HsInt64) w;}
and testing that function in isolation in a small C program shows no problem.
comment:36 Changed 5 years ago by
The CMM code generated on powerpc is identical to the CMM code generated on i386.
That suggests that the problem is the powerpc specific CMM -> ASM stage.
comment:37 Changed 5 years ago by
You could make the program even simpler by passing the result to a foreign-imported C function to print it out, instead of using showHex. Then you would be using no library code at all, and you should be able to trace through the assembly to figure out where something has gone wrong (unfortunatey I don't read PPC assembler so I'm no use here).
comment:38 follow-up: 39 Changed 5 years ago by
Thanks @simonmar, thats a good tip. Using the FFI:
import Foreign.C.Types
import Numeric
foreign import ccall "c_printInt64" printInt64 :: CLLong -> IO ()
main :: IO ()
main = do
printInt64 input
putStrLn (showHex input "")
input :: CLLong
input = 0x1a2a3a4a5a6a7a8a
and compiled without optimisation I get:
1a2a3a4a5a6a7a8a 1a2a3a4a5a6a7a8a
and with optimisation I get:
7a8a 1a2a3a4a5a6a7a8a
which is weird in that the FFI version is incorrect, but only when optimisation is off.
Modifying this example as:
import Foreign.C.Types
import Numeric
foreign import ccall "c_printInt64" printInt64 :: CLLong -> IO ()
main :: IO ()
main = do
printInt64 (succ input)
putStrLn (showHex (succ input) "")
input :: CLLong
input = 0x1a2a3a4a5a6a7a8a
and then both the FFI and the pure Haskell version are incorrect with optimisation.
This suggests that its actually a problem with the way 64 bit values are passed to functions.
comment:39 follow-up: 40 Changed 5 years ago by
Replying to erikd:
foreign import ccall "c_printInt64" printInt64 :: CLLong -> IO ()
Add "unsafe" here to make the generated code simpler.
main :: IO () main = do
printInt64 input putStrLn (showHex input "")
suggest getting rid of the putStrLn, again to make things simpler.
and compiled without optimisation I get:
1a2a3a4a5a6a7a8a 1a2a3a4a5a6a7a8aand with optimisation I get:
7a8a 1a2a3a4a5a6a7a8awhich is weird in that the FFI version is incorrect, but only when optimisation is off.
Did you mean to say on here?
This suggests that its actually a problem with the way 64 bit values are passed to functions.
Yes, probably.
So now you have a very simple example that generates incorrect code and doesn't call any library functions, the bug must be somewhere in the generated code for this module. Compile it with -S and eyeball the assembly code.
comment:40 Changed 5 years ago by
Replying to simonmar:
which is weird in that the FFI version is incorrect, but only when optimisation is off.
Did you mean to say on here?
Yes.
Simplifying the haskell code (adding unsafe as suggested):
import Foreign.C.Types import Numeric foreign import ccall unsafe "c_printInt64" printInt64 :: CLLong -> IO () main :: IO () main = printInt64 input input :: CLLong input = 0x1a2a3a4a5a6a7a8a
compiles to the following ASM:
Main.main1_info: _c1ni: lis 31, 0 ; r31 = 0 ori 31, 31, 31370 ; r31 |= 0x7a8a lis 30, 0 ; r30 = 0 ori 31, 31, 0 ; r30 |= 0 mr 3, 30 ; r3 = r30 mr 4, 31 ; r4 = r31 bl c_printInt64 lis 14, GHC.Tuple.()_closure+1@ha addi 14, 14, GHC.Tuple.()_closure+1@l lwz 31, 0(22) mtctr 31 bctr
Obviously the generated code is wrong. Looking at the PPC code generator now.
Changed 5 years ago by
| Attachment: | 0001-Fix-for-optimizer-bug-on-linux-powerpc-6156.patch added |
|---|
Patch to fix this.
comment:41 Changed 5 years ago by
| Status: | new → patch |
|---|
comment:42 Changed 5 years ago by
| Owner: | igloo deleted |
|---|---|
| Priority: | normal → high |
| Status: | patch → new |
Nice catch! I'll push and try to get it merged into 7.6.1.
comment:43 Changed 5 years ago by
commit b4b78631890a4cd9cde1551de9a4440e7e750372
Author: Erik de Castro Lopo <erikd@mega-nerd.com>
Date: Thu Aug 23 20:39:47 2012 +1000
Fix for optimizer bug on linux-powerpc (#6156).
compiler/nativeGen/PPC/CodeGen.hs | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)
comment:44 Changed 5 years ago by
| Status: | new → merge |
|---|
comment:45 Changed 5 years ago by
| Resolution: | → fixed |
|---|---|
| Status: | merge → closed |
Merged as ef4218994742e8400a48b4d6e1ae7e6b67650dc4.

Just tested this with GHC compiled from git HEAD and the optimised and un-optimised versions give the same results.