Opened 5 years ago
Last modified 4 weeks ago
#9251 new task
ghc does not expose branchless max/min operations as primops
Reported by: | carter | Owned by: | osa1 |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Compiler | Version: | 7.8.2 |
Keywords: | newcomer | Cc: | wren@…, sjakobi, ulysses4ever, AndreasK |
Operating System: | Unknown/Multiple | Architecture: | Unknown/Multiple |
Type of failure: | Runtime performance bug | Test Case: | |
Blocked By: | Blocking: | ||
Related Tickets: | #9246 | Differential Rev(s): | |
Wiki Page: |
Description
It was pointed out in #9246 that GHC currently does not expose primops for Max and Min on various unlifted types such as Float# and Double# and Int# and Word#, but that on most modern CPU architectures there is instruction level support for these operations. (Eg, Float# and #Double min and max can be provided on any 64bit x86 system pretty easily )
This ticket is to track doing that task. I'll probably do it one of the next two weekends to get back in the GHC hacking groove, or maybe this should be used as a "getting started" ticket for a new contributor?
Change History (26)
comment:1 Changed 5 years ago by
Summary: | ghc does not expose brancheless max/min operations as primops → ghc does not expose branchless max/min operations as primops |
---|
comment:2 Changed 5 years ago by
comment:3 Changed 5 years ago by
i'm working on that stuff, fret not!
There's going to be a IEEE *must* conformance compliant version of every operation, plus a possibly system dependent one.
It looks like IEEE actually specifies TWO versions of min and max.
I've got a local copy of IEEE-754-2008 downloaded after I googled around, I'll spec out the operations later this week.
Wrt the signedness of 0 piece, it says **may** always return the first argument
rather than must. Point being, the primops layer will likely have more than 1 variant, and the Num / Ord instances will have some choice of IEEE must condition + H2010 satisfying definition. So I think we'd want to have the haskell default min/max to treat arguments ±0 as ±1 wrt max and min. (ie max will pick +, min will pick -)
comment:4 Changed 5 years ago by
actually, importantly, in section 5.11 of the 2008 standard they say
Four mutually exclusive relations are possible: less than , equal, greater than, and unordered. The last case arises when at least one operand is NaN. Every NaN shall compare unordered with everything, including itself. Comparisons shall ignore the sign of zero (so +0 = −0). Infinite operands of the same sign shall compare equal.
thus the Ord compare based instance should actually perhaps throw an exception or otherwise fail when either argument is Nan? I'll need read a bit more.
ANYWAY, theres several *distinct* semantics that can be chosen, and adding suitable support to ghc for the different semantics that aren't interexpressible is something I'll spend some time thinking about.
don't worry, you'll have the version you want, but there will be an IEEE compliant one that also satisfies the corresponding haskell standard (BOTH, if they disagree theres a serious problem i'll have to have some discussion about)
comment:5 Changed 5 years ago by
Java is full of bad floating point design issues, not looking there for ANY design input. theres a fun rant here http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf
As i'm a very hefty user of ghc for floating point computation, i'm going to be very very thoughtful on the design :)
comment:6 Changed 5 years ago by
Yeah, I've seen those slides before. The problem with the arguments presented is that I know there is a class of users who demands "linguistically-legislated exact reproducibility" for floating-point computations, because I'm part of it! For my day job, I use Java to write a tool for modeling safety-critical software systems on desktop hardware. For our purposes, it is far more important that the results of the simulation be exactly identical across different platforms than it is they be numerically accurate or efficiently computable. We use Java's strictfp
keyword and StrictMath
class ubiquitously in our simulation engine, because it is essential that the output of our program be identical, no matter where or when it is run. When it is necessary to compute different or more numerically-accurate floating-point outputs, we emulate it in software. This is slow but allows us 100% confidence that you will always get the same results.
That said, our application is a rare case. In most situations, faster and more accurate computations that can give slightly different results on different machines are preferable to calculations that are wrong in exactly-reproducible ways.
Which section (if any) of the Haskell 2010 Report specifies the behavior of floating-point operations? I've glanced through it quickly but all I've been able to find is the Prelude definitions of the instances for Float and Double, which just list them as primitives.
comment:7 Changed 5 years ago by
To clarify what I said: both the perf and reproducible primops will be added and the one I'll pick for base will the reproducible/ portable one.
We agree. Bring the worry to the code review once I've got a proposed patch. :-). I write tooling for both workloads, I'm well well well aware of reproducibility being valuable.
comment:8 Changed 4 years ago by
Keywords: | newcomer added |
---|---|
Type of failure: | None/Unknown → Runtime performance bug |
Adding 'newcomer' keyword by carter's suggestion in the description.
comment:9 Changed 4 years ago by
Owner: | carter deleted |
---|
Sorry carter, I'm trying to get this ticket to show up in Newcomers. Reclaim it if you want of course.
comment:10 Changed 4 years ago by
sure thing. I think the only newcomer friendly bit would be writing the branchless min/max for Word and Int and friends. its a bit more subtle for the floating point versions
comment:11 Changed 4 years ago by
Milestone: | 7.10.1 → 7.12.1 |
---|
Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.
comment:12 Changed 4 years ago by
I'm trying to understand this ticket; do we need to compile these functions:
minDbl :: Double -> Double -> Double minDbl !d1 !d2 = if d1 < d2 then d1 else d2 minDbl' :: Double -> Double -> Double minDbl' !d1 !d2 = if d2 < d1 then d2 else d1
to branchless code(using movsd
with right argument ordering, and similarly
movss
for Float
), or are we only interested in branchless Ord.min
, Ord.max
etc.? E.g. something like this:
mindbl'' :: Double -> Double -> double mindbl'' (D# d1) (D# d2) = ... use new primop here ...
comment:13 Changed 4 years ago by
the idea is not to use it for the normal comparison code, but to make it available for those who need suitable supporting primops. *THOUGH* if we can provide suitable instruction level versions of the Ord based min and max, thats a thing we can think about too.
comment:14 Changed 4 years ago by
Owner: | set to osa1 |
---|
Just to give a status update: I made some progress on this http://lpaste.net/134455. I'm also assigning this to myself.
comment:16 Changed 3 years ago by
Cc: | wren@… added |
---|
comment:17 Changed 3 years ago by
Milestone: | 8.0.1 |
---|
comment:18 Changed 12 months ago by
Cc: | sjakobi added |
---|
comment:20 Changed 5 months ago by
I believe it is just waiting for someone to come along with a patch. Feel free to finish up osa1's patch.
comment:21 Changed 5 months ago by
Ben, I'm interested in this. For osa1's work: I don't see a patch. His link above has a tiny example program using something called min##
which is not defined there.
Is there a clear strategy on how to attack this ticket? Let's say, starting with the most conservative step: provide primop version of integer min
/ max
which, for x86, is implemented in assembly (CMOV
or SSE4's PMINSD
/ PMAXSD
).
comment:22 Changed 5 months ago by
> His link above has a tiny example program using something called min## which is not defined there.
Whoops! Sorry about that; I had just assumed there was a patch sitting there.
As you suggest, I would start by thinking about providing primops; in particular I would start by focusing on Int#
and Word#
. As pointed out above, floating point is a whole can of worms.
However, before this I would first try to show that such a primop would improve real code. Afterall, introducing a primop means that we are forced to maintain that primop for eternity. This carries a cost which we want to ensure would be justified by the benefits it brings.
comment:23 Changed 5 months ago by
I agree with measure-first reasoning. How would one compare the performance of the current code with the one using primop which we haven't defined yet, though?
comment:24 Changed 5 months ago by
Cc: | ulysses4ever added |
---|
comment:25 Changed 5 months ago by
Well, implementing then measuring (and accept the fact that the implementation may be thrown away as a consequence) is one option.
However, another option would be to work incrementally: measuring a best-case speed-up in a C (or similar) program.
comment:26 Changed 4 weeks ago by
Cc: | AndreasK added |
---|
For cases where branch prediction is impossible this makes quite the difference in my experience.
I have some proof of concept code for Int/Word already. I plan to keep the branch [here](https://github.com/AndreasPK/ghc/tree/minmax) updated until I got a patch ready.
If someone feels like taking this over contact me, otherwise I will probably put up a patch for the Int/Word in a month or two depending on how time I can spend on this.
Question: What "should" the behaviors of min and max be for the special cases of negative zero and NaN? Currently GHC does this:
The SSE instructions, at least, work similarly if you get the order right:
However, according to Wikipedia, IEEE 754-2008 specifies
Some further comparisons: Java's
Math.min
specifiesThe .NET Framework's
Math.Min
specifies