Unboxed Sums Crash
I've made it a little further in my experiments with unboxed tuples in the packed
library. However, I've run into another issue that I strongly suspect is the result of bad behavior of unboxed tuples. To replicate this issue (with GHC 8.4.3), do the following:
git clone https://github.com/andrewthad/packed
cd packed
cabal new-build
We use cabal new-build
for its side effect of creating a .ghc.environment.xyz
file. Now, create a minimal example in the directory called eol.hs
with the following contents:
import Packed.Bytes.Parser (Parser)
import Data.Word
import Packed.Bytes (Bytes)
import GHC.Exts (RealWorld)
import Packed.Bytes.Stream.IO (ByteStream)
import qualified Packed.Bytes as B
import qualified Data.Char
import qualified Packed.Bytes.Parser as P
import qualified Packed.Bytes.Stream.IO as Stream
main :: IO ()
main = do
r <- runExampleParser
( do P.takeBytesUntilEndOfLineConsume
P.takeBytesUntilEndOfLineConsume
P.takeBytesUntilEndOfLineConsume
) (foldMap Stream.singleton (map charToWord8 "the\nemporium\rhas\narrived"))
print r
runExampleParser :: Parser e () a -> ByteStream RealWorld -> IO (Maybe a, Maybe String)
runExampleParser parser stream = do
P.Result mleftovers r _ <- P.parseStreamIO stream () parser
mextra <- case mleftovers of
Nothing -> return Nothing
Just (P.Leftovers chunk remainingStream) -> do
bs <- Stream.unpack remainingStream
return (Just (map word8ToChar (B.unpack chunk ++ bs)))
return (either (const Nothing) Just r,mextra)
word8ToChar :: Word8 -> Char
word8ToChar = Data.Char.chr . fromIntegral
charToWord8 :: Char -> Word8
charToWord8 = fromIntegral . Data.Char.ord
s2b :: String -> Bytes
s2b = B.pack . map charToWord8
c2w :: Char -> Word8
c2w = charToWord8
Finally, build this with ghc -O2 eol.hs
, and then run the executable this produces to get the following:
(Nothing,Just "\rhas\narrived")
eol: internal error: stg_ap_n_ret
(GHC version 8.4.3 for x86_64_unknown_linux)
Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug
Aborted (core dumped)
Things worth noting:
- I think the program fails in the final GC that runs right before the program terminates. We can see that it produces a correct result of
(Nothing,Just "\rhas\narrived")
, but something on the heap has definitely been corrupted. - This only happens with
-O2
turned on. - This only happens when the parser does not successfully parse its input.
Here's some more context around this. I've been working on a parser that uses unboxed sums instead of continuations. After #15038 (closed) was fixed, everything had been going well. Then, I took the parser type and added two things to it: (1) context and (2) typed errors. Context is basically like throwing StateT
on top and errors are like throwing ExceptT
on top. After this, everything in my test suite kept working except for a single test, which now consistently crashes my test suite. So, I originally had this:
type Bytes# = (# ByteArray#, Int#, Int# #)
type Maybe# (a :: TYPE r) = (# (# #) | a #)
type Leftovers# s = (# Bytes# , ByteStream s #)
type Result# s (r :: RuntimeRep) (a :: TYPE r) =
(# Maybe# (Leftovers# s), Maybe# a #)
newtype ParserLevity (r :: RuntimeRep) (a :: TYPE r) = ParserLevity
{ getParserLevity :: forall s.
Maybe# (Leftovers# s)
-> State# s
-> (# State# s, Result# s r a #)
}
But I changed it to this:
type Bytes# = (# ByteArray#, Int#, Int# #)
type Maybe# (a :: TYPE r) = (# (# #) | a #)
type Either# a (b :: TYPE r) = (# a | b #)
type Leftovers# s = (# Bytes# , ByteStream s #)
type Result# e c s (r :: RuntimeRep) (a :: TYPE r) =
(# Maybe# (Leftovers# s), Either# (Maybe e) a, c #)
newtype ParserLevity e c (r :: RuntimeRep) (a :: TYPE r) = ParserLevity
{ getParserLevity :: forall s.
c
-> Maybe# (Leftovers# s)
-> State# s
-> (# State# s, Result# e c s r a #)
}
Specifically, the function causing trouble is (as currently defined):
{-# NOINLINE takeBytesUntilEndOfLineConsumeUnboxed #-}
takeBytesUntilEndOfLineConsumeUnboxed :: ParserLevity e c BytesRep Bytes#
takeBytesUntilEndOfLineConsumeUnboxed = ParserLevity (go (# (# #) | #)) where
go :: Maybe# Bytes# -> c -> Maybe# (Leftovers# s) -> State# s -> (# State# s, Result# e c s BytesRep Bytes# #)
go !_ c (# (# #) | #) s0 = (# s0, (# (# (# #) | #), (# Nothing | #), c #) #)
go !mbytes c (# | (# bytes0@(# arr0, off0, len0 #), !stream0@(ByteStream streamFunc) #) #) s0 = case BAW.findAnyByte2 (I# off0) (I# len0) 10 13 (ByteArray arr0) of
Nothing -> case streamFunc s0 of
(# s1, r #) -> go (# | appendMaybeBytes mbytes bytes0 #) c r s1
Just (I# ix, W8# theByte) -> case theByte of
10## -> (# s0, (# (# | (# unsafeDrop# ((ix -# off0) +# 1# ) bytes0, stream0 #) #), (# | appendMaybeBytes mbytes (# arr0, off0, ix -# off0 #) #), c #) #)
-- second case means it was 13
_ -> case ix <# (off0 +# len0 -# 1#) of
1# -> case indexWord8Array# arr0 (ix +# 1# ) of
10## -> (# s0, (# (# | (# unsafeDrop# ((ix -# off0) +# 2# ) bytes0, stream0 #) #), (# | appendMaybeBytes mbytes (# arr0, off0, ix -# off0 #) #), c #) #)
_ -> (# s0, (# (# | (# unsafeDrop# (ix -# off0) bytes0, stream0 #) #), (# Nothing | #), c #) #)
_ -> case nextNonEmpty stream0 s0 of
(# s1, m #) -> case m of
(# (# #) | #) -> (# s1, (# (# | (# unboxBytes (B.singleton 13), Stream.empty #) #), (# Nothing | #), c #) #)
(# | (# bytes1@(# arr1, _, _ #), stream1 #) #) -> case indexWord8Array# arr1 0# of
10## -> (# s1, (# (# | (# unsafeDrop# 1# bytes1, stream1 #) #), (# | appendMaybeBytes mbytes (# arr0, off0, ix -# off0 #) #), c #) #)
_ -> (# s1, (# (# | (# unboxBytes (B.cons 13 (boxBytes bytes1)), stream1 #) #), (# Nothing | #), c #) #)
That's all I've got for now. If no one's able to make headway, I'll probably come back to this and try to make a more minimal example at some point. I won't have time to do this soon though.