GHC Defeats Manual Worker Wrapper with Unboxed Sum
Here's the code in question:
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE MagicHash #-}
{-# LANGUAGE TypeInType #-}
{-# LANGUAGE UnboxedSums #-}
{-# LANGUAGE UnboxedTuples #-}
{-# OPTIONS_GHC -O2 #-}
module Byte.Array.Window
( findByte
, boxMaybeInt
) where
import Data.Primitive (ByteArray)
import Data.Word (Word8)
import GHC.Types (RuntimeRep,TYPE)
import GHC.Int (Int(I#))
import GHC.Exts (Int#)
import qualified Data.Primitive as PM
type Maybe# (a :: TYPE (r :: RuntimeRep)) = (# (# #) | a #)
boxMaybeInt :: Maybe# Int# -> Maybe Int
boxMaybeInt = \case
(# | a #) -> Just (I# a)
(# (# #) | #) -> Nothing
unboxInt :: Int -> Int#
unboxInt (I# i) = i
-- | Finds the first occurrence of the given byte.
-- TODO: optimize this to search through a whole
-- Word64 at a time if the bytearray is pinned.
findByte :: Int -> Int -> Word8 -> ByteArray -> Maybe Int
findByte !off !len !w !arr = boxMaybeInt (go off) where
go :: Int -> Maybe# Int#
go !ix = if ix < len
then if PM.indexByteArray arr ix == w
then (# | unboxInt ix #)
else go (ix + 1)
else (# (# #) | #)
When compiled with GHC 8.5 with -ddump-simpl -dsuppress-all
, here is the relevant part of the resulting Core:
-- RHS size: {terms: 33, types: 13, coercions: 0, joins: 1/1}
$wfindByte
$wfindByte
= \ ww_s38C ww1_s38G ww2_s38K ww3_s38O ->
joinrec {
$wgo_s38v
$wgo_s38v ww4_s38t
= case <# ww4_s38t ww1_s38G of {
__DEFAULT -> Nothing;
1# ->
case indexWord8Array# ww3_s38O ww4_s38t of wild_a36w { __DEFAULT ->
case eqWord# wild_a36w ww2_s38K of {
__DEFAULT -> jump $wgo_s38v (+# ww4_s38t 1#);
1# -> Just (I# ww4_s38t)
}
}
}; } in
jump $wgo_s38v ww_s38C
-- RHS size: {terms: 21, types: 12, coercions: 0, joins: 0/0}
findByte
findByte
= \ w_s38w w1_s38x w2_s38y w3_s38z ->
case w_s38w of { I# ww1_s38C ->
case w1_s38x of { I# ww3_s38G ->
case w2_s38y of { W8# ww5_s38K ->
case w3_s38z of { ByteArray ww7_s38O ->
$wfindByte ww1_s38C ww3_s38G ww5_s38K ww7_s38O
}
}
}
}
I expected that the tail recursive go
helpful function from my original code would still be a function that returns an unboxed sum when optimized and turned into Core. However, it isn't. The call to boxMaybeInt
gets pushed into go
. This means that when findByte
is called and the result cased on, an allocation is going to happen. I think it would be preferable for boxMaybeInt
to not get pushed into the worker, since boxMaybeInt (go off)
could be inlined and the allocation of Maybe
could be prevented (assuming that it was cased on right afterward).
Trac metadata
Trac field | Value |
---|---|
Version | 8.5 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Compiler |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |