`round :: Double -> Int64` much slower than `fromIntegral @Int @Int64 . round`
Consider this simple code:
{-# LANGUAGE TypeApplications #-}
{-# OPTIONS_GHC -ddump-simpl -ddump-to-file -ddump-rule-firings #-}
module C (f, g) where
import Data.Int (Int64)
f :: Double -> Int64
f = round
g :: Double -> Int64
g = fromIntegral @Int @Int64 . round
There is a rule in GHC.Int
that should translate f
into g
"round/Double->Int64"
round = (fromIntegral :: Int -> Int64) . (round :: Double -> Int)
If I compile the above module, I see
Rule fired: round/Double->Int64 (GHC.Int)
Rule fired: fromIntegral/a->Int64 (GHC.Int)
Rule fired: fromIntegral/Int->Int (GHC.Real)
Rule fired: fromIntegral/a->Int64 (GHC.Int)
Rule fired: fromIntegral/Int->Int (GHC.Real)
Rule fired: round/Double->Int (GHC.Float)
however in Core the functions end up different: f
has an extra dictionary passed around:
-- RHS size: {terms: 11, types: 6, coercions: 0, joins: 0/0}
f :: Double -> Int64
[GblId,
Arity=1,
Str=<S(S),1*U(U)>m,
Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=False)
Tmpl= \ (x_a1sG [Occ=Once!] :: Double) ->
case x_a1sG of { GHC.Types.D# ww1_a28u [Occ=Once] ->
case GHC.Float.$w$cround @ Int GHC.Real.$fIntegralInt ww1_a28u of
{ GHC.Types.I# x#_a1Rr [Occ=Once] ->
GHC.Int.I64# x#_a1Rr
}
}}]
f = \ (x_a1sG :: Double) ->
case x_a1sG of { GHC.Types.D# ww1_a28u ->
case GHC.Float.$w$cround @ Int GHC.Real.$fIntegralInt ww1_a28u of
{ GHC.Types.I# x#_a1Rr ->
GHC.Int.I64# x#_a1Rr
}
}
-- RHS size: {terms: 12, types: 14, coercions: 0, joins: 0/0}
g :: Double -> Int64
[GblId,
Arity=1,
Caf=NoCafRefs,
Str=<S(S),1*U(U)>m,
Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=False)
Tmpl= \ (x_X1sX [Occ=Once!] :: Double) ->
case x_X1sX of { GHC.Types.D# ds1_a28C [Occ=Once] ->
case {__pkg_ccall base-4.12.0.0 Double#
-> State# RealWorld -> (# State# RealWorld, Double# #)}_a28B
ds1_a28C GHC.Prim.realWorld#
of
{ (# _ [Occ=Dead], ds3_a28H [Occ=Once] #) ->
GHC.Int.I64# (GHC.Prim.double2Int# ds3_a28H)
}
}}]
g = \ (x_X1sX :: Double) ->
case x_X1sX of { GHC.Types.D# ds1_a28C ->
case {__pkg_ccall base-4.12.0.0 Double#
-> State# RealWorld -> (# State# RealWorld, Double# #)}_a28B
ds1_a28C GHC.Prim.realWorld#
of
{ (# ds2_a28G, ds3_a28H #) ->
GHC.Int.I64# (GHC.Prim.double2Int# ds3_a28H)
}
}
This makes it over an order of magnitude slower than what it should be!
benchmarked f
time 111.1 ns (109.1 ns .. 113.9 ns)
0.997 R² (0.996 R² .. 0.998 R²)
mean 111.1 ns (110.4 ns .. 112.0 ns)
std dev 2.724 ns (2.195 ns .. 3.271 ns)
benchmarked g
time 7.160 ns (7.104 ns .. 7.224 ns)
1.000 R² (0.999 R² .. 1.000 R²)
mean 7.196 ns (7.171 ns .. 7.231 ns)
std dev 100.5 ps (71.97 ps .. 153.8 ps)
I don't know why the dictionary is present in f
but not in g
.