Opened 7 years ago

Last modified 2 months ago

#2387 new bug

Optimizer misses unboxing opportunity

Reported by: dolio Owned by:
Priority: lowest Milestone: 7.12.1
Component: Compiler Version: 6.8.2
Keywords: optimizer unbox box Cc: jan.stolarek@…, mail@…
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: #2289 Differential Revisions:

Description

In my studying of the fannkuch benchmark, I've discovered (I think) another missed optimization. A scaled down illustration goes as follows:

{-# LANGUAGE TypeOperators, BangPatterns #-}

module Main (main) where

import Control.Monad.ST

import System.Environment

data (:*:) a b = !a :*: !b

whileLoop :: Int -> ST s Int
whileLoop = go 0
 where
 go !n k
   | k == 0    = return n
   | otherwise = go (n+1) (k-1)
{-# INLINE whileLoop #-}

iter :: Int -> Int -> ST s (Bool :*: Int)
iter k n = do
  k' <- whileLoop 40 >>= \k' -> return $! max k k'
  b <- return (n == 0)
  
  return $! b :*: k'
{-# INLINE iter #-}

mainLoop :: Int -> Int -> ST s Int
mainLoop k n = do
  done :*: k' <- iter k n
  
  if done
    then return k'
    else mainLoop k' (n - 1)

main = print =<< stToIO . mainLoop 0 . read . head =<< getArgs

If we look at the core for whileLoop's worker, we see:

$wpoly_go_r1aE :: forall s_aem.
                  Int# -> Int# -> STRep s_aem Int

$wpoly_go_r1aE =
  \ (@ s_aem)
    (ww_s18Y :: Int#)
    (ww1_s192 :: Int#)
    (eta_s19w :: State# s_aem) ->
    case ww1_s192 of wild_XF {
      __DEFAULT ->
        $wpoly_go_r1aE @ s_aem (+# ww_s18Y 1) (-# wild_XF 1) eta_s19w;
      0 -> (# eta_s19w, I# ww_s18Y #)
    }

Note, the return type is a boxed Int. The function is only used once, like so:

    ...
    case $wpoly_go_r1aE @ s_aem 0 40 w_s19f of wild_aFw { (# new_s_aFB, r_aFC #) ->
    case r_aFC of wild1_aG9 { I# y_aGb ->
    case <=# ww_s199 y_aGb of wild2_aGd {
      False ->
    ...

In other words, go boxes its results at the end of the loop, and the function that uses it immediately looks inside the box for the value. In this particular micro-benchmark, the boxed value (wild1_aG9 above) is actually used in the case where mainLoop returns the boxed value. However, in the larger benchmark I pulled this from, that is not the case in several areas (only the unboxed value is used, but it still goes through a box). Either way, the boxed value is only used at the end of the loop (and not every time there), and on every other iteration, this results in superfluous allocation.

I'll attach a manually unboxed version I wrote (it also has iter and max manually inlined to mainLoop, since that makes it easier to write; the core for the above shows that they are getting inlined properly, so I assume that isn't the issue). Using +RTS -sstderr shows that (running 100 million iterations here), the manually unboxed version allocates 50 kilobytes on the heap, and runs in around 15 seconds, whereas the version above that doesn't get unboxed does 1.6 gigabytes of heap allocation, and takes 18 seconds (in the larger benchmark, such extra boxing would happen perhaps 40 million times over the course of the program).

Thanks for your help.

Attachments (1)

Unboxed.hs (1020 bytes) - added by dolio 7 years ago.
Manually unboxed version of the benchmark

Download all attachments as: .zip

Change History (20)

Changed 7 years ago by dolio

Manually unboxed version of the benchmark

comment:1 Changed 7 years ago by dolio

Here's an even simpler test case:

module Main (main) where

import System.Environment

loop :: Int -> IO Int
loop n = go 0
 where
 go i
   | i < n     = go (i+1)
   | otherwise = return i

main = loop 10 >>= print

This turns into:

$wgo :: Int#
             -> State# RealWorld
             -> (# State# RealWorld, Int #)

$wgo =
  \ (ww_sxz :: Int#) (eta_sxO :: State# RealWorld) ->
    case <# ww_sxz 10 of wild_B1 {
      False -> (# eta_sxO, I# ww_sxz #);
      True -> $wgo (+# ww_sxz 1) eta_sxO
    }

a :: State# RealWorld
          -> (# State# RealWorld, () #)

a =
  \ (eta_apt :: State# RealWorld) ->
    case $wgo 0 eta_apt of wild_avR { (# new_s_avT, a103_avU #) ->
    case a24
           stdout
           (case a103_avU of w_avB { I# ww_avD ->
            $wshowSignedInt 0 ww_avD ([] @ Char)
            })
           new_s_avT
    of wild1_avo { (# new_s1_avq, a1031_avr #) ->
    $wa6 stdout '\n' new_s1_avq
    }
    }

Similarly:

{-# LANGUAGE UnboxedTuples #-}

module Main (main) where

import System.Environment

loop :: Int -> (# Int, Int #)
loop n = go 0
 where
 go i
   | i < n     = go (i+1)
   | otherwise = (# i, i #)

main = case loop 10 of
         (# i, _ #) -> print i

Yields:

$wgo :: Int# -> (# Int, Int #)

$wgo =
  \ (ww_sw8 :: Int#) ->
    case <# ww_sw8 10 of wild_B1 {
      False ->
        let {
          wild1_swr :: Int
          []
          wild1_swr = I# ww_sw8 } in
        (# wild1_swr, wild1_swr #);
      True -> $wgo (+# ww_sw8 1)
    }

To get the level of unboxing desired, we must:

module Main (main) where

import System.Environment

data Pair = P {-# UNPACK #-} !Int {-# UNPACK #-} !Int

loop :: Int -> Pair
loop n = go 0
 where
 go i
   | i < n     = go (i+1)
   | otherwise = P i i

main = case loop 10 of
         P i _ -> print i

Which becomes:

$wgo :: Int# -> (# Int#, Int# #)

$wgo =
  \ (ww_swB :: Int#) ->
    case <# ww_swB 10 of wild_B1 {
      False -> (# ww_swB, ww_swB #);
      True -> $wgo (+# ww_swB 1)
    }

main :: IO ()

main =
  case $wgo 0 of ww_swH { (# ww1_swJ, ww2_swK #) ->
  (\ (eta1_aoZ :: State# RealWorld) ->
     case a24
            stdout
            ($wshowSignedInt 0 ww1_swJ ([] @ Char))
            eta1_aoZ
     of wild_auM { (# new_s_auO, a103_auP #) ->
     $wa6 stdout '\n' new_s_auO
     })
  `cast` (sym ((:CoIO) ())
          :: State# RealWorld
             -> (# State# RealWorld, () #)
               ~
             IO ())
  }

However, that isn't an option for ST, obviously. A custom, strict, polymorphic pair doesn't unpack, either (obviously, perhaps), so there appears to be no way with ST/IO to get the level of unpacking that one would be able to get writing with explicit State# s and unboxed tuples. Or, more generally, there appears to be no way to return multiple unboxed results from a recursive function without explicitly dealing with unboxed values/strict monomorphic products.

comment:2 Changed 7 years ago by simonpj

  • difficulty set to Unknown

Good example. I believe that it's the same phenomenon as #2289. I was going to close this, but on reflection I'll leave them both open and cross-linked.

Simon

comment:3 Changed 7 years ago by igloo

  • Milestone set to 6.10 branch

comment:4 Changed 7 years ago by simonmar

  • Type changed from bug to run-time performance bug

comment:5 Changed 6 years ago by igloo

  • Milestone changed from 6.10 branch to 6.12 branch

comment:6 Changed 5 years ago by simonmar

  • Type of failure set to Runtime performance bug

comment:7 Changed 5 years ago by igloo

  • Milestone changed from 6.12 branch to 6.12.3

comment:8 Changed 5 years ago by igloo

  • Milestone changed from 6.12.3 to 6.14.1
  • Priority changed from normal to low

comment:9 Changed 4 years ago by igloo

  • Milestone changed from 7.0.1 to 7.0.2

comment:10 Changed 4 years ago by igloo

  • Milestone changed from 7.0.2 to 7.2.1

comment:11 Changed 3 years ago by igloo

  • Milestone changed from 7.2.1 to 7.4.1

comment:12 Changed 3 years ago by igloo

  • Milestone changed from 7.4.1 to 7.6.1
  • Priority changed from low to lowest

comment:13 Changed 2 years ago by igloo

  • Milestone changed from 7.6.1 to 7.6.2

comment:14 Changed 2 years ago by morabbin

comment:15 Changed 21 months ago by jstolarek

  • Cc jan.stolarek@… added

comment:16 Changed 16 months ago by nomeata

  • Cc mail@… added

comment:17 Changed 8 months ago by thoughtpolice

  • Milestone changed from 7.6.2 to 7.10.1

Moving to 7.10.1.

comment:18 Changed 2 months ago by thoughtpolice

  • Milestone changed from 7.10.1 to 7.12.1

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:19 Changed 2 months ago by thoughtpolice

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

Note: See TracTickets for help on using tickets.