Opened 14 months ago

Closed 14 months ago

Last modified 2 months ago

#12585 closed bug (duplicate)

GHC duplicates string literals in rodata section and breaks 'Ptr Addr#' equality

Reported by: slyfox Owned by:
Priority: normal Milestone:
Component: Compiler Version: 8.1
Keywords: strings Cc: duncan
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Incorrect result at runtime Test Case:
Blocked By: Blocking:
Related Tickets: #11292 Differential Rev(s):
Wiki Page:

Description

-- test-sha256.hs:
{-# LANGUAGE MagicHash #-}
module Main (main) where

import GHC.Prim (Addr#)
import GHC.Ptr (Ptr(..), minusPtr)

bug :: Addr# -> IO ()
bug a = do
    print ("cmp:", Ptr a == Ptr a)
    print ("delta:", Ptr a `minusPtr` Ptr a)
    print ("values:", Ptr a, Ptr a)

main :: IO ()
main = bug "Assumptions are subtle!"#
$ inplace/bin/ghc-stage2 -fforce-recomp -O1 --make test-sha256.hs && ./test-sha256
[1 of 1] Compiling Main             ( test-sha256.hs, test-sha256.o )
Linking test-sha256 ...
("cmp:",False)
("delta:",-24)
("values:",0x000000000072fdc0,0x000000000072fda8)

Stg shows that literal gets inlined:

$ inplace/bin/ghc-stage2 -fforce-recomp -O1 --make test-sha256 -ddump-stg -dsuppress-all -dsuppress-uniques 2>&1 | grep Assumptions

                                eqAddr# ["Assumptions are subtle!"# "Assumptions are subtle!"#]
                                minusAddr# ["Assumptions are subtle!"# "Assumptions are subtle!"#]
                                                    $w$cshowsPrec "Assumptions are subtle!"# w2
                                                    $w$cshowsPrec "Assumptions are subtle!"# w2
                                eqAddr# ["Assumptions are subtle!"# "Assumptions are subtle!"#]
                                minusAddr# ["Assumptions are subtle!"# "Assumptions are subtle!"#]
                                                    $w$cshowsPrec "Assumptions are subtle!"# w2
                                                    $w$cshowsPrec "Assumptions are subtle!"# w2

I've found this bug as a SIGSEGV on testsuite cryptohash-sha256-0.11.100.1 from hackage.

Bytestring assumes that address does not change and implements loops over Ptrs https://github.com/haskell/bytestring/blob/master/Data/ByteString.hs#L1171 :

filter :: (Word8 -> Bool) -> ByteString -> ByteString
filter k ps@(PS x s l)
    | null ps   = ps
    | otherwise = unsafePerformIO $ createAndTrim l $ \p -> withForeignPtr x $ \f -> do
        t <- go (f `plusPtr` s) p (f `plusPtr` (s + l))
        return $! t `minusPtr` p -- actual length
    where
        go !f !t !end | f == end  = return t
                      | otherwise = do
                          w <- peek f
                          if k w
                            then poke t w >> go (f `plusPtr` 1) (t `plusPtr` 1) end
                            else             go (f `plusPtr` 1) t               end
{-# INLINE filter #-}

In case of cryptohash-sha256-0.11.100.1 t <- go (f plusPtr s) p (f plusPtr (s + l)) for literal inlined righ at 'f' call which caused testsuite failure.

It seems sensible not to emit the literal more than once into .rodata section.

It won't guard against problems where literal is exported as a part of .hi file but might be good enough for common cases like this.

Change History (7)

comment:1 Changed 14 months ago by slyfox

GHC-8.0.1 works as expected:

$ ghc-8.0.1 -fforce-recomp -O1 --make a.hs && ./a

[1 of 1] Compiling Main             ( a.hs, a.o )
Linking a ...
("cmp:",True)
("delta:",0)
("values:",0x00000000004a7888,0x00000000004a7888)

comment:2 Changed 14 months ago by slyfox

Type of failure: None/UnknownIncorrect result at runtime
Version: 8.0.18.1

comment:3 Changed 14 months ago by slyfox

Comment 10 at https://ghc.haskell.org/trac/ghc/ticket/11292#comment:10 suggests to disambiguate Addr# and "literals"# with different types.

comment:4 Changed 14 months ago by mpickering

comment:5 Changed 14 months ago by simonpj

What about Ptr "foo"# == Ptr "foo"#? That is, two textually-distinct string literals. Are they distinct or the same?

It is probably a mistake for string literals to have type Addr#.

  • Addresses of type Addr# certainly should be compared simply by comparing their values
  • But string should not be compared by comparing their addresses.

Still, I suppose that if every string literal in the program gave rise to a single top-level defintion in the program, that might help. See #8472

comment:6 Changed 14 months ago by simonpj

Resolution: duplicate
Status: newclosed

Actually this is a duplicate of #11312, which lacks a volunteer to take it forward.

comment:7 Changed 2 months ago by bgamari

Keywords: strings added
Note: See TracTickets for help on using tickets.