Opened 17 months ago
Last modified 2 months ago
#9251 new task
ghc does not expose branchless max/min operations as primops
Reported by: | carter | Owned by: | osa1 |
---|---|---|---|
Priority: | normal | Milestone: | 8.0.1 |
Component: | Compiler | Version: | 7.8.2 |
Keywords: | newcomer | Cc: | wren@… |
Operating System: | Unknown/Multiple | Architecture: | Unknown/Multiple |
Type of failure: | Runtime performance bug | Test Case: | |
Blocked By: | Blocking: | ||
Related Tickets: | #9246 | Differential Rev(s): | |
Wiki Page: |
Description
It was pointed out in #9246 that GHC currently does not expose primops for Max and Min on various unlifted types such as Float# and Double# and Int# and Word#, but that on most modern CPU architectures there is instruction level support for these operations. (Eg, Float# and #Double min and max can be provided on any 64bit x86 system pretty easily )
This ticket is to track doing that task. I'll probably do it one of the next two weekends to get back in the GHC hacking groove, or maybe this should be used as a "getting started" ticket for a new contributor?
Change History (16)
comment:1 Changed 17 months ago by carter
- Summary changed from ghc does not expose brancheless max/min operations as primops to ghc does not expose branchless max/min operations as primops
comment:2 Changed 17 months ago by arotenberg
comment:3 Changed 17 months ago by carter
i'm working on that stuff, fret not!
There's going to be a IEEE *must* conformance compliant version of every operation, plus a possibly system dependent one.
It looks like IEEE actually specifies TWO versions of min and max.
I've got a local copy of IEEE-754-2008 downloaded after I googled around, I'll spec out the operations later this week.
Wrt the signedness of 0 piece, it says **may** always return the first argument rather than must. Point being, the primops layer will likely have more than 1 variant, and the Num / Ord instances will have some choice of IEEE must condition + H2010 satisfying definition. So I think we'd want to have the haskell default min/max to treat arguments ±0 as ±1 wrt max and min. (ie max will pick +, min will pick -)
comment:4 Changed 17 months ago by carter
actually, importantly, in section 5.11 of the 2008 standard they say
Four mutually exclusive relations are possible: less than , equal, greater than, and unordered. The last case arises when at least one operand is NaN. Every NaN shall compare unordered with everything, including itself. Comparisons shall ignore the sign of zero (so +0 = −0). Infinite operands of the same sign shall compare equal.
thus the Ord compare based instance should actually perhaps throw an exception or otherwise fail when either argument is Nan? I'll need read a bit more.
ANYWAY, theres several *distinct* semantics that can be chosen, and adding suitable support to ghc for the different semantics that aren't interexpressible is something I'll spend some time thinking about.
don't worry, you'll have the version you want, but there will be an IEEE compliant one that also satisfies the corresponding haskell standard (BOTH, if they disagree theres a serious problem i'll have to have some discussion about)
comment:5 Changed 17 months ago by carter
Java is full of bad floating point design issues, not looking there for ANY design input. theres a fun rant here http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf
As i'm a very hefty user of ghc for floating point computation, i'm going to be very very thoughtful on the design :)
comment:6 Changed 17 months ago by arotenberg
Yeah, I've seen those slides before. The problem with the arguments presented is that I know there is a class of users who demands "linguistically-legislated exact reproducibility" for floating-point computations, because I'm part of it! For my day job, I use Java to write a tool for modeling safety-critical software systems on desktop hardware. For our purposes, it is far more important that the results of the simulation be exactly identical across different platforms than it is they be numerically accurate or efficiently computable. We use Java's strictfp keyword and StrictMath class ubiquitously in our simulation engine, because it is essential that the output of our program be identical, no matter where or when it is run. When it is necessary to compute different or more numerically-accurate floating-point outputs, we emulate it in software. This is slow but allows us 100% confidence that you will always get the same results.
That said, our application is a rare case. In most situations, faster and more accurate computations that can give slightly different results on different machines are preferable to calculations that are wrong in exactly-reproducible ways.
Which section (if any) of the Haskell 2010 Report specifies the behavior of floating-point operations? I've glanced through it quickly but all I've been able to find is the Prelude definitions of the instances for Float and Double, which just list them as primitives.
comment:7 Changed 17 months ago by carter
To clarify what I said: both the perf and reproducible primops will be added and the one I'll pick for base will the reproducible/ portable one.
We agree. Bring the worry to the code review once I've got a proposed patch. :-). I write tooling for both workloads, I'm well well well aware of reproducibility being valuable.
comment:8 Changed 13 months ago by thomie
- Keywords newcomer added
- Type of failure changed from None/Unknown to Runtime performance bug
Adding 'newcomer' keyword by carter's suggestion in the description.
comment:9 Changed 13 months ago by thomie
- Owner carter deleted
Sorry carter, I'm trying to get this ticket to show up in Newcomers. Reclaim it if you want of course.
comment:10 Changed 13 months ago by carter
sure thing. I think the only newcomer friendly bit would be writing the branchless min/max for Word and Int and friends. its a bit more subtle for the floating point versions
comment:11 Changed 11 months ago by thoughtpolice
- Milestone changed from 7.10.1 to 7.12.1
Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.
comment:12 Changed 6 months ago by osa1
I'm trying to understand this ticket; do we need to compile these functions:
minDbl :: Double -> Double -> Double minDbl !d1 !d2 = if d1 < d2 then d1 else d2 minDbl' :: Double -> Double -> Double minDbl' !d1 !d2 = if d2 < d1 then d2 else d1
to branchless code(using movsd with right argument ordering, and similarly movss for Float), or are we only interested in branchless Ord.min, Ord.max etc.? E.g. something like this:
mindbl'' :: Double -> Double -> double mindbl'' (D# d1) (D# d2) = ... use new primop here ...
comment:13 Changed 6 months ago by carter
the idea is not to use it for the normal comparison code, but to make it available for those who need suitable supporting primops. *THOUGH* if we can provide suitable instruction level versions of the Ord based min and max, thats a thing we can think about too.
comment:14 Changed 6 months ago by osa1
- Owner set to osa1
Just to give a status update: I made some progress on this http://lpaste.net/134455. I'm also assigning this to myself.
comment:15 Changed 3 months ago by thoughtpolice
- Milestone changed from 7.12.1 to 8.0.1
Milestone renamed
comment:16 Changed 2 months ago by WrenThornton
- Cc wren@… added
Question: What "should" the behaviors of min and max be for the special cases of negative zero and NaN? Currently GHC does this:
The SSE instructions, at least, work similarly if you get the order right:
However, according to Wikipedia, IEEE 754-2008 specifies
Some further comparisons: Java's Math.min specifies
The .NET Framework's Math.Min specifies