Opened 14 months ago

Last modified 3 months ago

#9251 new task

ghc does not expose branchless max/min operations as primops

Reported by: carter Owned by: osa1
Priority: normal Milestone: 7.12.1
Component: Compiler Version: 7.8.2
Keywords: newcomer Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: #9246 Differential Revisions:

Description

It was pointed out in #9246 that GHC currently does not expose primops for Max and Min on various unlifted types such as Float# and Double# and Int# and Word#, but that on most modern CPU architectures there is instruction level support for these operations. (Eg, Float# and #Double min and max can be provided on any 64bit x86 system pretty easily )

This ticket is to track doing that task. I'll probably do it one of the next two weekends to get back in the GHC hacking groove, or maybe this should be used as a "getting started" ticket for a new contributor?

Change History (14)

comment:1 Changed 14 months ago by carter

  • Summary changed from ghc does not expose brancheless max/min operations as primops to ghc does not expose branchless max/min operations as primops

comment:2 Changed 14 months ago by arotenberg

Question: What "should" the behaviors of min and max be for the special cases of negative zero and NaN? Currently GHC does this:

Prelude> min 0.0 (-0.0) :: Double
0.0
Prelude> min (-0.0) 0.0 :: Double
-0.0
Prelude> min 0.0 (0.0/0.0) :: Double
NaN
Prelude> min (0.0/0.0) 0.0 :: Double
0.0

The SSE instructions, at least, work similarly if you get the order right:

If the values being compared are both 0.0s (of either sign), the value in the second operand (source operand) is returned. If a value in the second operand is an SNaN, that SNaN is returned unchanged to the destination (that is, a QNaN version of the SNaN is not returned).

If only one value is a NaN (SNaN or QNaN) for this instruction, the second operand (source operand), either a NaN or a valid floating-point value, is written to the result. If instead of this behavior, it is required that the NaN source operand (from either the first or second operand) be returned, the action of MINSD can be emulated using a sequence of instructions, such as, a comparison followed by AND, ANDN and OR.

However, according to Wikipedia, IEEE 754-2008 specifies

The min and max operations are defined but leave some leeway for the case where the inputs are equal in value but differ in representation. In particular:

min(+0,−0) or min(−0,+0) must produce something with a value of zero but may always return the first argument.

In order to support operations such as windowing in which a NaN input should be quietly replaced with one of the end points, min and max are defined to select a number, x, in preference to a quiet NaN:

min(x,NaN) = min(NaN,x) = x max(x,NaN) = max(NaN,x) = x

In the current draft, these functions are called minNum and maxNum to indicate their preference for a number over a quiet NaN.

Some further comparisons: Java's Math.min specifies

Returns the smaller of two double values. That is, the result is the value closer to negative infinity. If the arguments have the same value, the result is that same value. If either value is NaN, then the result is NaN. Unlike the numerical comparison operators, this method considers negative zero to be strictly smaller than positive zero. If one argument is positive zero and the other is negative zero, the result is negative zero.

The .NET Framework's Math.Min specifies

Parameter val1 or val2, whichever is smaller. If val1, val2, or both val1 and val2 are equal to NaN, NaN is returned.

comment:3 Changed 14 months ago by carter

i'm working on that stuff, fret not!

There's going to be a IEEE *must* conformance compliant version of every operation, plus a possibly system dependent one.

It looks like IEEE actually specifies TWO versions of min and max.

I've got a local copy of IEEE-754-2008 downloaded after I googled around, I'll spec out the operations later this week.

Wrt the signedness of 0 piece, it says **may** always return the first argument rather than must. Point being, the primops layer will likely have more than 1 variant, and the Num / Ord instances will have some choice of IEEE must condition + H2010 satisfying definition. So I think we'd want to have the haskell default min/max to treat arguments ±0 as ±1 wrt max and min. (ie max will pick +, min will pick -)

Last edited 14 months ago by carter (previous) (diff)

comment:4 Changed 14 months ago by carter

actually, importantly, in section 5.11 of the 2008 standard they say

Four mutually exclusive relations are possible: less than , equal, greater than, and unordered. The last case arises when at least one operand is NaN. Every NaN shall compare unordered with everything, including itself. Comparisons shall ignore the sign of zero (so +0 = −0). Infinite operands of the same sign shall compare equal.

thus the Ord compare based instance should actually perhaps throw an exception or otherwise fail when either argument is Nan? I'll need read a bit more.

ANYWAY, theres several *distinct* semantics that can be chosen, and adding suitable support to ghc for the different semantics that aren't interexpressible is something I'll spend some time thinking about.

don't worry, you'll have the version you want, but there will be an IEEE compliant one that also satisfies the corresponding haskell standard (BOTH, if they disagree theres a serious problem i'll have to have some discussion about)

comment:5 Changed 14 months ago by carter

Java is full of bad floating point design issues, not looking there for ANY design input. theres a fun rant here http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf

As i'm a very hefty user of ghc for floating point computation, i'm going to be very very thoughtful on the design :)

comment:6 Changed 14 months ago by arotenberg

Yeah, I've seen those slides before. The problem with the arguments presented is that I know there is a class of users who demands "linguistically-legislated exact reproducibility" for floating-point computations, because I'm part of it! For my day job, I use Java to write a tool for modeling safety-critical software systems on desktop hardware. For our purposes, it is far more important that the results of the simulation be exactly identical across different platforms than it is they be numerically accurate or efficiently computable. We use Java's strictfp keyword and StrictMath class ubiquitously in our simulation engine, because it is essential that the output of our program be identical, no matter where or when it is run. When it is necessary to compute different or more numerically-accurate floating-point outputs, we emulate it in software. This is slow but allows us 100% confidence that you will always get the same results.

That said, our application is a rare case. In most situations, faster and more accurate computations that can give slightly different results on different machines are preferable to calculations that are wrong in exactly-reproducible ways.

Which section (if any) of the Haskell 2010 Report specifies the behavior of floating-point operations? I've glanced through it quickly but all I've been able to find is the Prelude definitions of the instances for Float and Double, which just list them as primitives.

comment:7 Changed 14 months ago by carter

To clarify what I said: both the perf and reproducible primops will be added and the one I'll pick for base will the reproducible/ portable one.

We agree. Bring the worry to the code review once I've got a proposed patch. :-). I write tooling for both workloads, I'm well well well aware of reproducibility being valuable.

comment:8 Changed 10 months ago by thomie

  • Keywords newcomer added
  • Type of failure changed from None/Unknown to Runtime performance bug

Adding 'newcomer' keyword by carter's suggestion in the description.

comment:9 Changed 10 months ago by thomie

  • Owner carter deleted

Sorry carter, I'm trying to get this ticket to show up in Newcomers. Reclaim it if you want of course.

comment:10 Changed 10 months ago by carter

sure thing. I think the only newcomer friendly bit would be writing the branchless min/max for Word and Int and friends. its a bit more subtle for the floating point versions

comment:11 Changed 8 months ago by thoughtpolice

  • Milestone changed from 7.10.1 to 7.12.1

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:12 Changed 3 months ago by osa1

I'm trying to understand this ticket; do we need to compile these functions:

minDbl :: Double -> Double -> Double
minDbl !d1 !d2 = if d1 < d2 then d1 else d2

minDbl' :: Double -> Double -> Double
minDbl' !d1 !d2 = if d2 < d1 then d2 else d1

to branchless code(using movsd with right argument ordering, and similarly movss for Float), or are we only interested in branchless Ord.min, Ord.max etc.? E.g. something like this:

mindbl'' :: Double -> Double -> double
mindbl'' (D# d1) (D# d2) = ... use new primop here ...

comment:13 Changed 3 months ago by carter

the idea is not to use it for the normal comparison code, but to make it available for those who need suitable supporting primops. *THOUGH* if we can provide suitable instruction level versions of the Ord based min and max, thats a thing we can think about too.

comment:14 Changed 3 months ago by osa1

  • Owner set to osa1

Just to give a status update: I made some progress on this http://lpaste.net/134455. I'm also assigning this to myself.

Note: See TracTickets for help on using tickets.