GHC generates poor code for repeated uses of min/max
Consider the following module, which intends to implement a branchless ray-AABB intersection test:
module SimpleGeom where
data Vec3 = Vec3
{-# UNPACK #-} !Double
{-# UNPACK #-} !Double
{-# UNPACK #-} !Double
data Ray = Ray !Vec3 !Vec3 !Vec3
data AABB = AABB !Vec3 !Vec3
testRayAABBIntersection :: Ray -> AABB -> Bool
testRayAABBIntersection (Ray (Vec3 ox oy oz) _ (Vec3 invDx invDy invDz))
(AABB (Vec3 minX minY minZ) (Vec3 maxX maxY maxZ)) =
let tx1 = (minX - ox) * invDx
tx2 = (maxX - ox) * invDx
ty1 = (minY - oy) * invDy
ty2 = (maxY - oy) * invDy
tz1 = (minZ - oz) * invDz
tz2 = (maxZ - oz) * invDz
tmin = min tx1 tx2 `max` min ty1 ty2 `max` min tz1 tz2
tmax = max tx1 tx2 `min` max ty1 ty2 `min` max tz1 tz2
in tmax >= max 0 tmin
Everything is strict primitive operations, so GHC should generate very simple, fast code, right? But upon compiling with ghc -O -ddump-simpl -ddump-to-file SimpleGeom
, I found a mess of nested local lambdas and similar performance-killing expression forms. (See the attached output file.)
There are two possible issues I can see here.
- GHC is trying to expand out all of the branches recursively (I would presume via case-of-cases transformation), which is a bad idea in this instance compared to just performing the cases sequentially and storing their results.
- GHC is generating branches for floating-point min/max. Instruction sets like SSE2 include non-branching floating-point min/max instructions, which is exactly what this algorithm was designed to exploit, but GHC does not generate code that could take advantage of these instructions.
Trac metadata
Trac field | Value |
---|---|
Version | 7.8.2 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Compiler |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | Windows |
Architecture |