Opened 10 years ago

Last modified 10 months ago

#2269 new feature request

Word type to Double or Float conversions are slower than Int conversions

Reported by: dons Owned by: dons@…
Priority: lowest Milestone:
Component: Compiler Version: 6.8.2
Keywords: rules, performance, double, newcomer Cc: daniel.is.fischer@…, dterei
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

We have int2Double# and int2Float# primitives, but not equivalent ones for Word types. We may need word2Double# too, for Words* to be fully first-class performance-wise.

This means we have to do extra tests in the Num instances for Word types to implement 'fromIntegral':

    toInteger (W# x#)
        | i# >=# 0#             = smallInteger i#
        | otherwise             = wordToInteger x#
        where i# = word2Int# x#

Now, for some types, we work around this:

"fromIntegral/Int->Word"  fromIntegral = \(I# x#) -> W# (int2Word# x#)
"fromIntegral/Word->Int"  fromIntegral = \(W# x#) -> I# (word2Int# x#)
"fromIntegral/Word->Word" fromIntegral = id :: Word -> Word

and so on for other Word/Int types. And all is fine.

The problem comes up for Float and Double. For Int, we can write:

"fromIntegral/Int->Float"   fromIntegral = int2Float
"fromIntegral/Int->Double"  fromIntegral = int2Double

int2Float :: Int -> Float
int2Float   (I# x) = F# (int2Float# x)

int2Double :: Int -> Double 
int2Double   (I# x) = D# (int2Double#   x)

But we can't write these rules for Word types.

The result is a slow down on Word conversions, consider this program:

main = print . sumU
             . mapU (fromIntegral::Int->Double)
             $ enumFromToU 0 100000000

When in lhs is Int, we get this nice code:

$wfold :: Double# -> Int# -> Double#

$wfold =
  \ (ww_s18k :: Double#) (ww1_s18o :: Int#) ->
    case ># ww1_s18o 100000000 of wild_a14T {
      False ->
        $wfold
          (+## ww_s18k (int2Double# ww1_s18o)) (+# ww1_s18o 1);
      True -> ww_s18k


But for Word types, we get:

$wfold :: Double# -> Word# -> Double#

$wfold =
  \ (ww_s1gN :: Double#) (ww1_s1gR :: Word#) ->
    case gtWord# ww1_s1gR __word 100000000 of wild_a1do {
      False ->
        case case >=# (word2Int# ww1_s1gR) 0 of wild1_a1cS {
               False ->
                 case word2Integer# ww1_s1gR of wild11_a1d9 { (# s_a1db, d_a1dc #) ->
                 case {__ccall __encodeDouble Int#
                        -> ByteArray#
                        -> Int#
                        -> State# RealWorld
                        -> (# State# RealWorld, Double# #)}_a1bT
                        s_a1db d_a1dc 0 realWorld#
                 of wild12_a1bX { (# ds1_a1bZ, ds2_a1c0 #) ->
                 ds2_a1c0
                 }
                 };
               True -> int2Double# (word2Int# ww1_s1gR)
             }
        of wild1_a1bM { __DEFAULT ->
        $wfold
          (+## ww_s1gN wild1_a1bM) (plusWord# ww1_s1gR __word 1)
        };
      True -> ww_s1gN
    }

Which is to be expected, and the running time goes from:

$ time ./henning  
5.00000000067109e17
./henning  1.53s user 0.00s system 99% cpu 1.534 total

To:

$ time ./henning  
5.00000000067109e17
./henning  4.57s user 0.00s system 99% cpu 4.571 total

So not too bad, but still, principle of least surprise says Word and Int should behave the same.

Should we have a word2Double# primop?

Change History (22)

comment:1 Changed 10 years ago by igloo

difficulty: Unknown
Milestone: 6.10 branch

I'd like word2Double# and word2Float# for integer-simple too, so it sounds good to me!

comment:2 Changed 9 years ago by simonmar

Architecture: UnknownUnknown/Multiple

comment:3 Changed 9 years ago by simonmar

Operating System: UnknownUnknown/Multiple

comment:4 Changed 9 years ago by igloo

Milestone: 6.10 branch6.12 branch

comment:5 Changed 8 years ago by igloo

Milestone: 6.12 branch6.12.3

comment:6 Changed 7 years ago by igloo

Milestone: 6.12.36.14.1
Priority: normallow

comment:7 Changed 7 years ago by daniel.is.fischer

Cc: daniel.is.fischer@… added
Type of failure: None/Unknown

I'd like primops too, but I don't know how one would do that. What I can offer is adding Word -> Float and Word -> Double conversions to primFloat.c.

comment:8 Changed 7 years ago by daniel.is.fischer

Type of failure: None/UnknownRuntime performance bug

comment:9 Changed 7 years ago by igloo

Milestone: 7.0.17.0.2

comment:10 Changed 7 years ago by igloo

Milestone: 7.0.27.2.1

comment:11 Changed 7 years ago by dterei

Cc: dterei added

comment:12 Changed 6 years ago by igloo

Milestone: 7.2.17.4.1

comment:13 Changed 6 years ago by igloo

Milestone: 7.4.17.6.1
Priority: lowlowest

comment:14 Changed 5 years ago by igloo

Milestone: 7.6.17.6.2

comment:15 Changed 3 years ago by thoughtpolice

Milestone: 7.6.27.10.1

Moving to 7.10.1.

comment:16 Changed 3 years ago by thomie

Primops for word2Double and word2Float were added 2 years ago.

commit 2e8c769422740c001e0a247bfec61d4f78598582

Author: Johan Tibell <>
Date:   Wed Dec 5 19:08:48 2012 -0800

    Implement word2Float# and word2Double#

commit cd01e48fbc548ff8d81ab547108bfdde8a113cd7

Author: Johan Tibell <>
Date:   Thu Dec 13 12:03:40 2012 -0800

    Add test for word2Double# and word2Float#

commit a18cf9cbdfec08732f5b7e0c886a5d899a6a5998

Author: Johan Tibell <>
Date:   Thu Dec 13 14:49:58 2012 -0800

    Add fromIntegral/Word->Double and fromIntegral/Word-Float rules

commit 8cd4ced57dccc1f4f54d242982209ec61e145700

Author: Johan Tibell <>
Date:   Tue Dec 18 14:40:02 2012 +0100

    perf test for Word->Float/Double conversion

commit 6d5f25f5e0b33173fb2e7983cab40808c723f220

Author: Geoffrey Mainland <>
Date:   Thu Jan 3 16:59:03 2013 +0000

    Fix LLVM code generated for word2Float# and word2Double#.

commit 744035fdd4b882c17ef7c6e4439b9e7099e7ec3d

Author: Johan Tibell <>
Date:   Mon Jan 7 21:35:07 2013 -0800

    Fix Word2Float# test on 32-bit

The resulting core of the example from the description now looks the same for Word->Double as for Int->Double.

$ cabal install vector

$ cat test.hs
{-# LANGUAGE CPP #-}
import Data.Vector as V
import Data.Word

main = print . V.sum
#ifdef WORD
             . V.map (fromIntegral::Word->Double)
#else
             . V.map (fromIntegral::Int->Double)
#endif
             $ V.enumFromTo 0 100000000

$ ghc -ddump-simpl -dsuppress-all -O2 -fforce-recomp -DWORD test.hs -o testWord
...
main_$s$wfoldlM'_loop
main_$s$wfoldlM'_loop =
  \ sc_s5G6 sc1_s5G7 ->
    case tagToEnum# (leWord# sc1_s5G7 (__word 100000000)) of _ {
      False -> sc_s5G6;
      True ->
        main_$s$wfoldlM'_loop
          (+## sc_s5G6 (word2Double# sc1_s5G7))
          (plusWord# sc1_s5G7 (__word 1))
    }
...

$ ghc -ddump-simpl -dsuppress-all -O2 -fforce-recomp -DINT test.hs -o testInt
...
main_$s$wfoldlM'_loop
main_$s$wfoldlM'_loop =
  \ sc_s5GQ sc1_s5GR ->
    case tagToEnum# (<=# sc1_s5GR 100000000) of _ {
      False -> sc_s5GQ;
      True ->
        main_$s$wfoldlM'_loop
          (+## sc_s5GQ (int2Double# sc1_s5GR)) (+# sc1_s5GR 1)
    }

But testWord is still 3 times slower than testInt.

$ time ./testWord
5.00000005e15

real	0m0.579s
user	0m0.575s
sys	0m0.003s

$ time ./testInt
5.00000005e15

real	0m0.196s
user	0m0.191s
sys	0m0.004s

As I can not easily explain this difference, I'll leave this ticket open for now.

comment:17 Changed 3 years ago by thoughtpolice

Milestone: 7.10.17.12.1

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:18 Changed 3 years ago by thoughtpolice

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:19 Changed 2 years ago by thoughtpolice

Milestone: 7.12.18.0.1

Milestone renamed

comment:20 Changed 22 months ago by thomie

Milestone: 8.0.1

comment:21 in reply to:  16 Changed 10 months ago by rwbarton

Replying to thomie:

But testWord is still 3 times slower than testInt.

$ time ./testWord
5.00000005e15

real	0m0.579s
user	0m0.575s
sys	0m0.003s

$ time ./testInt
5.00000005e15

real	0m0.196s
user	0m0.191s
sys	0m0.004s

As I can not easily explain this difference, I'll leave this ticket open for now.

It's because the x86 NCG implements the new MO_UF_Conv as a call to a C function, rather than generating code inline like MO_SF_Conv (cvtsi2sdq).

Unfortunately there's no corresponding instruction for converting an unsigned 64-bit integer to float or double, but for converting to double the code generated by clang -O is pretty small and simple and probably worth inlining. It will still be somewhat slower than Int though.

comment:22 Changed 10 months ago by rwbarton

Keywords: newcomer added

Namely,

double f(unsigned long x)
{
  return x;
}

/*
0000000000000000 <f>:
   0:	66 48 0f 6e cf       	movq   %rdi,%xmm1
   5:	66 0f 62 0d 00 00 00 	punpckldq 0x0(%rip),%xmm1        # d <f+0xd>
   c:	00 
            9: R_X86_64_PC32	.LCPI0_0-0x4
   d:	66 0f 5c 0d 00 00 00 	subpd  0x0(%rip),%xmm1        # 15 <f+0x15>
  14:	00 
            11: R_X86_64_PC32	.LCPI0_1-0x4
  15:	66 0f 70 c1 4e       	pshufd $0x4e,%xmm1,%xmm0
  1a:	66 0f 58 c1          	addpd  %xmm1,%xmm0
  1e:	c3                   	retq   
*/

Compiling the test program here with LLVM there's no measurable difference between the int and word versions, so I guess doing this is worthwhile.

Note: See TracTickets for help on using tickets.