Opened 7 years ago

Last modified 18 months ago

#1349 new task

Generalise the ! and UNPACK mechanism for data types, to unpack function arguments

Reported by: simonpj Owned by:
Priority: normal Milestone: 7.6.2
Component: Compiler Version: 6.6.1
Keywords: Cc: id@…, pho@…, bos@…, johan.tibell@…, kolmodin@…, slyfox@…, coreyoconnor@…, illissius@…, hackage.haskell.org@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description (last modified by simonpj)

See this thread:

http://www.nabble.com/More-speed-please!-t3411977.html

Briefly the idea is to allow

  data T = MkT (!Int -> Bool)

to make a MkT hold strict functions only. Anyone unpacking a MkT can assume the function is strict; and anyone building a MkT gets a strictness wrapper aound whatever function they supply, so even if they supply a lazy function, it's made strict.

Seems like a natural generalisation of the existing strictness and UNPACK mechanism for data types.

Lots of details in the thread above.

Simon

Change History (39)

comment:1 Changed 7 years ago by igloo

  • Milestone set to 6.1

It's probably optimistic to hope for this in 6.8.

comment:2 Changed 7 years ago by guest

comment:3 follow-up: Changed 7 years ago by Isaac Dupree

  • Cc id@… added

seems tricky. I hope it can be done in a way that makes the language feel no more complicated. e.g.

type F = !Int -> Foo
data Foo where C :: F

seems fine and similar to something I've wanted to do (but didn't try) (when there were multiple similar signatures in my GADT). The explicit signature on a thing (of course, explicit signatures are required for data declarations) is what can coerce it to be strict, I guess.

comment:4 Changed 7 years ago by Isaac Dupree

Strictness annotations in function results don't make sense (for the same reason id is strict -- function results are "already WHNF-strict" as much as it makes sense), and they're already lifted (in the sense that they can possibly produce _|_) even for unboxed result types (e.g. bot :: () -> Int#; bot x = bot x). But unboxing a result-tuple can result in a loss of sharing, so in

newtype State s a
        = State
        { runState' :: s -> (# a, s #) }

the unboxedness of the tuple really is significant for efficiency reasons, to tell GHC that it doesn't need to preserve sharing. See http://www.nabble.com/there-isn%27t-any-difference%2C-is-there%2C-with-unboxed-tuples--to14629139.html

So I also want a function-result-syntax of UNPACK, presumably without an intervening meaningless strictness annotation:

newtype State s a
        = State
        { runState' :: s -> {-# UNPACK #-} ( a, s ) }

comment:5 in reply to: ↑ 3 ; follow-up: Changed 7 years ago by duncan

Replying to Isaac Dupree:

seems tricky. I hope it can be done in a way that makes the language feel no more complicated. e.g.

type F = !Int -> Foo
data Foo where C :: F

This feature request is not about distinguishing strictness in types as the above would indicate. It's about changing the strictness and data representation of data type constructors that have higher order components.

comment:6 in reply to: ↑ 5 Changed 7 years ago by Isaac Dupree

Replying to duncan:

This feature request is not about distinguishing strictness in types as the above would indicate. It's about changing the strictness and data representation of data type constructors that have higher order components.

I don't know what this feature request is "about". But I'll note that some people do want strictness in types other than data declarations, for example, http://article.gmane.org/gmane.comp.lang.haskell.cafe/34857

also my comment about GADT syntax is #1768

comment:7 Changed 6 years ago by PHO

  • Cc pho@… added

comment:8 Changed 6 years ago by simonmar

  • Architecture changed from Unknown to Unknown/Multiple

comment:9 Changed 6 years ago by simonmar

  • Operating System changed from Unknown to Unknown/Multiple

comment:10 Changed 5 years ago by igloo

  • Milestone changed from 6.10 branch to 6.12 branch

comment:11 Changed 4 years ago by bos

  • Cc bos@… added
  • Type of failure set to None/Unknown

comment:12 Changed 4 years ago by tibbe

  • Cc johan.tibell@… added

comment:13 Changed 4 years ago by igloo

  • Milestone changed from 6.12 branch to 6.12.3

comment:14 Changed 4 years ago by kolmodin

  • Cc kolmodin@… added

comment:15 Changed 4 years ago by tibbe

While working on a builder monoid for Data.Text I ran in to a problem related to lack of unpacking for continuation arguments. Using these types:

data Buffer s = Buffer {-# UNPACK #-} !(A.MArray s Word16)
                       {-# UNPACK #-} !Int  -- offset
                       {-# UNPACK #-} !Int  -- used units
                       {-# UNPACK #-} !Int  -- length left

newtype Builder = Builder {
     runBuilder :: forall s. (Buffer s -> ST s [S.Text])
                -> Buffer s
                -> ST s [S.Text]
   }

the following module generates bad code with lots of boxing and closures:

module Loop (loop) where

import Data.Monoid (mappend, mempty)
import Data.Text.Lazy.Builder

loop  :: Int -> Builder
loop n = go 0 mempty
  where
    go i acc
      | i < n     = go (i+1) (acc `mappend` singleton 'a')
      | otherwise = acc

comment:16 Changed 4 years ago by tibbe

I tried to manually unpack the arguments like so:

newtype Builder = Builder {
     runBuilder :: forall s. (A.MArray s Word16 -> Int# -> Int# -> Int#
                              -> ST s [S.Text])
                -> A.MArray s Word16 -> Int# -> Int# -> Int#
                -> ST s [S.Text]
   }

However, GHC still still generates a new closure each time around the loop so perhaps unpacking will not help in my case.

comment:17 Changed 4 years ago by simonpj

Can you make a standalone test case? I got

T1349.hs:7:41: Not in scope: type constructor or class `A.MArray'

T1349.hs:7:52: Not in scope: type constructor or class `Word16'

T1349.hs:13:43: Not in scope: type constructor or class `ST'

T1349.hs:13:49: Not in scope: type constructor or class `S.Text'

T1349.hs:15:20: Not in scope: type constructor or class `ST'

T1349.hs:15:26: Not in scope: type constructor or class `S.Text'

T1349.hs:22:45: Not in scope: `singleton'

Simon

comment:18 Changed 4 years ago by tibbe

Here are the instructions (one of the files is too big to attach):

wget http://johantibell.com/files/text-spj.tar.gz
tar zxf text-spj.tar.gz
cd text-spj
cabal configure && cabal build
wget http://johantibell.com/files/Loop.hs
ghc -c -ddump-simpl -O2 Loop.hs -idist/build -package-name text-0.7.1.0 | less

comment:19 Changed 4 years ago by tibbe

I may have spoken too soon. Switching the loop to use foldr improves the generated core a lot. It still suffers from the extra boxing/unboxing mentioned in the ticket description.

comment:20 Changed 4 years ago by simonpj

  • Description modified (diff)

Now I'm unsure what to do. Maybe the rero instructions you give above have been overtaken by your change to foldr? Maybe the extra boxing/unboxing that you are nw seeing is the same as in the description of the ticket, or maybe not?

I'm always keen to have well-characterised examples of missed optimisation opportunities. If you think you have one, is it possible for you to produce a reproducible standalone test case of the situation as you now see it?

Or maybe its water under the bridge now.

Thanks

Simon

comment:21 Changed 4 years ago by tibbe

It's a bit tricky to come up with a good example as inlining seems to remove the problem for simple cases. Here's an attempt. Given the module:

module Cont (run) where

type Cont = (Int -> Int) -> Int -> Int

g :: Cont
g k n = let n' = n + 1 in n' `seq` k n'

loop  :: Int -> Cont
loop 0 = id
loop n = g . loop (n - 1)

run = loop 10 id

we get the Core

Cont.$wloop :: Int#
               -> (Int -> Int)
               -> Int
               -> Int
Cont.$wloop =
  \ (ww_shX :: Int#) ->
    case ww_shX of ds_Xhe {
      __DEFAULT ->
        let {
          g_si5 :: (Int -> Int)
                   -> Int
                   -> Int
          g_si5 = Cont.$wloop (-# ds_Xhe 1) } in
        \ (x_ahg :: Int -> Int) ->
          let {
            k_si7 [ALWAYS Just L] :: Int -> Int
            k_si7 = g_si5 x_ahg } in
          \ (n_adm :: Int) ->
            case n_adm of _ { I# x1_ahv ->
            k_si7 (I# (+# x1_ahv 1))
            };
      0 -> id @ (Int -> Int)
    }

Cont.run :: Int -> Int
Cont.run = Cont.$wloop 10 (id @ Int)

However, manual unboxing

{-# LANGUAGE MagicHash #-}

module Cont2 (run) where

import GHC.Prim
import GHC.Base

type Cont = (Int# -> Int) -> Int# -> Int

g :: Cont
g k n = k (n +# 1#)

loop  :: Int -> Cont
loop 0 = id
loop n = g . loop (n - 1)

run = loop 10 (\i -> (I# i))

gives

Cont2.$wloop :: Int#
                -> (Int# -> Int)
                -> Int#
                -> Int
Cont2.$wloop =
  \ (ww_shs :: Int#) ->
    case ww_shs of ds_XgX {
      __DEFAULT ->
        let {
          g_shA :: (Int# -> Int)
                   -> Int#
                   -> Int
          g_shA = Cont2.$wloop (-# ds_XgX 1) } in
        \ (x_agX :: Int# -> Int) ->
          let {
            k_shC [ALWAYS Just L] :: Int# -> Int
            k_shC = g_shA x_agX } in
          \ (n_adn :: Int#) -> k_shC (+# n_adn 1);
      0 -> id @ (Int# -> Int)
    }

Cont2.run :: Int# -> Int
Cont2.run = Cont2.$wloop 10 I#

I'd like to be able to express that Cont's first parameter (of type Int -> Int) is strict in its first (and only) arguments and that I want it unboxed.

comment:22 Changed 4 years ago by simonpj

OK I get the idea. But there are a number of corners. For example:

  • If I have a function of type (!Int -> Int) can I pass it to a function expecting an argument of type (Int -> Int). Perhaps yes, with some automatic impedence matching?
  • If there is automatic impedence matching might I lose sharing? Example: suppose
    f :: Int -> !Int -> Int
    g :: (Int -> Int -> Int) -> Int
    
    If GHC sees (g f) in source code, should it transform to (g (\xy. case y of I# y# -> g x y#))? Tha might lose sharing if (f x) did a lot of computation before returning a function.
  • Does the type !a -> a make sense? It does for data constructors.


  • An implementation question is this: how should these bang types be represented? Is there a new type constructor for "!"? Is (!Int -> Int) different to (Int# -> Int) or not?

A good framework to think of this in is Max's paper Types are calling conventions.

I doubt I'm going to move fast on this, I'm afraid.

Simon

comment:23 Changed 4 years ago by igloo

  • Milestone changed from 6.12.3 to 6.14.1
  • Priority changed from normal to low

comment:24 Changed 4 years ago by kolmodin

I thought I might share my story too, as I ran into this as well, and got aware of this ticket.
3 years after Duncan wrote his mail about the need for speed in binary I ran into the same trouble, again with the binary package.

Duncan wrote about the Put module, which creates lazy Bytestrings from an DSL.
The Get module reads such Bytestrings with a similar DSL.

I've rewritten the Get module to use CPS, which allows some useful tricks when parsing.

The Get data type from the non-CPS Get module looks like this:

-- | The parse state
data S = S {-# UNPACK #-} !B.ByteString  -- current chunk
           L.ByteString                  -- the rest of the input
           {-# UNPACK #-} !Int64         -- bytes read

-- | The Get monad is just a State monad carrying around the input ByteString
newtype Get a = Get { unGet :: S -> (# a, S #) }

While the CPS version looks like this (heavily influenced by attoparsec):

data S = S { input      :: !B.ByteString
           , next_input :: !B.ByteString
           , read_all   :: !Bool
           } deriving Show

data Result a = Fail S [String] String
              | Partial (Maybe B.ByteString -> Result a)
              | Done S a

type Failure   r = S -> [String] -> String -> Result r
type Success a r = S -> a -> Result r


-- unrolled codensity/state monad
newtype Get a = C { runCont :: forall r.
                               S ->  
                               Failure   r ->
                               Success a r ->
                               Result    r } 

Unfortunately, this yields terrible performance.
The old Get gives me about 26mb/s (Core 2 Duo, 1.6Ghz) when reading chunks of 1 byte at a time, while the CPS version only gives up to 1mb/s! Much time seems to be spent doing GC, as the bigger Bytestring I read the worse performance I get.

After reasoning a bit about what GHC actually does, I unpacked the state into the Get data type:

type Failure   r = B.ByteString -> B.ByteString -> Bool -> [String] -> String -> Result r
type Success a r = B.ByteString -> B.ByteString -> Bool -> a -> Result r

-- unrolled codensity/state monad
newtype Get a = C { runCont :: forall r.
                               B.ByteString ->
                               B.ByteString ->
                               Bool ->
                               Failure   r ->
                               Success a r ->
                               Result    r }

This at least gave me 12mb/s (I think I've even been able to get the same 26mb/s, when further trying to optimize). The price to pay is that the code is now not as maintainable, it truly looks horrible.

So it still seems this ticket is important in some cases.

comment:25 Changed 4 years ago by igloo

  • Milestone changed from 7.0.1 to 7.0.2

comment:26 Changed 4 years ago by slyfox

  • Cc slyfox@… added

comment:27 Changed 4 years ago by CoreyOConnor

  • Cc coreyoconnor@… added

comment:28 Changed 3 years ago by igloo

  • Milestone changed from 7.0.2 to 7.2.1

comment:29 Changed 3 years ago by illissius

  • Cc illissius@… added

comment:30 Changed 3 years ago by ezyang

Original Nabble link is dead, I think this is it? http://haskell.1045720.n5.nabble.com/More-speed-please-td3182083.html

comment:31 Changed 3 years ago by ezyang

I strongly suspect Hoopl would also benefit from an optimization like this.

comment:32 Changed 3 years ago by ezyang

Some answers for spj's questions:

  • Yes, this should be allowed, by analogy to data constructors: if I write a function \c x -> c x, we don't mind if the constructor c is strict or lazy in x. Now, if we want to be able to give assertions about strictness, this is less clear, but right now these sorts of bang-patterns change semantics, so that's a different tree to bark up.
  • I don't know how feasible this is, but one way to work around the sharing issue is to use the arity analyzer to figure out if we need to insert intermediate lambdas in the case that f x of f x y does work. It's kind of hacky though...
  • Yes, definitely. It doesn't make too much sense for a polymorphic variable, but !() -> () does in fact give more information than () -> ().
  • If we think the current design for bang patterns in data constructors is right, we use HsBangTy, erase it for any typechecking, and make sure it gets added to the right operational places. So they probably should be treated as different.

comment:33 Changed 3 years ago by ezyang

Here is some more food for thought: if I write:

seq :: !a -> b -> b
seq _ y = y

Should this be equivalent to:

seq :: a -> b -> b
seq !_ y = y

I personally think yes.

comment:34 Changed 3 years ago by simonpj

At least in the monomorphic case for data constructors I think I can see how to make this work. Let's review ordinary UNPACK pragmas as they are today, and let's just consider single-constructor types for now. Then when I have, say,

data T = C {-# UNPACK #-} !(Int,Int) 

the UNPACK affects (a) construction of C and (b) pattern matching against C. I'll express it like this:

data T = Cw Int Int

makeC :: (Int,Int) -> T
makeC x = case x of (a,b) -> Cw a b

matchC :: T -> ((Int,Int) -> r) -> r
matchC x k = case x of { Cw a b -> k (a,b) }

Here

  • The data type T is ultimately represented by a data type with a data constructor Cw, with two fields.
  • A use of C to construct a value is tranformed to a call of makeC. The function makeC constructs a value of type T; its signature is what the programmer expects for his constructor C.
  • A use of C in pattern matching is transformed to a call of matchC. The function matchC matches against a value of type T, and applies its continuatoin k to an (Int,Int) pair, just as the programmer expects.

So the makeC/matchC pair completely express what it means to build and pattern match against constructor C. The constructor C is not really a constructor at all; it's a view on the underlying data constructor Cw.

Now for kolmodin's type

newtype Get a = C { runCont :: forall r.
                               S ->  
                               Failure   r ->
                               Success a r ->
                               Result    r } 

I think you would really like to say

newtype Get a = C { runCont :: forall r.
                               {-# UNPACK #-} !S ->  
                               Failure   r ->
                               Success a r ->
                               Result    r } 

And what might that mean? Presumably it means the function stored in the data type has its arguments unpacked. So the translation is

newtype Get a = Cw { runCont :: forall r.
                               B.ByteString ->
                               B.ByteString ->
                               Bool ->
                               Failure   r ->
                               Success a r ->
                               Result    r }
makeC :: (forall r. S -> Failure r -> Success a r -> Result r)
      -> Get a
makeC f = Get (\i ni ra -> f (S i ni ra))

matchC :: Get a -> ((forall r. S -> Failure r -> Success a r -> Result r) -> r') -> r'
matchC (C f) = k (\(S i ni ra) -> f i ni ra)

Or something like that. The point is that the UNPACK and ! pragmas control the generation of makeC and matchC, in a way that we can pin down quite precisely.

Things to notice

  • The UNPACK stuff might be recursive. For example, the ByteString arguments of S are themselves given UNPACK pragmas, so probably the real of C is more like this:
    newtype Get a = Cw { runCont :: forall r.
                                   Addr# -> Int# -> Int# ->
                                   Addr# -> Int# -> Int# ->
                                   Bool ->
                                   Failure   r ->
                                   Success a r ->
                                   Result    r }
    
    Now, whether you really want this I'm not sure, but that is the natural thing.
  • I have no idea what to do in the polymorphic case, when you can't "see" the type of the argument to be UNPACKed. But that is just as now.
  • As always with UNPACK, there's a danger that we'll do unnecessary unboxing and reboxing. For example, with the data type T above, every type we match against C we'll build a pair (a,b) (just look at matchC). If the body of the pattern match uses the pair entire, rather than deconstructing it, there's a danger that we'll build the pair lots of times (once for each match) rather than once (when we build the C value). And its no different for this function-UNPACK stuff.
  • All of this is for constructor arguments. I'm vastly less sure of the merits of this kind of stuff for ordinary functions.

comment:35 Changed 3 years ago by batterseapower

Simon: if the UNPACKed thing was not the first argument e.g.:

newtype Get a = C { runCont :: forall r.
                               Failure   r ->
                               {-# UNPACK #-} !S ->  
                               Success a r ->
                               Result    r } 

We might naively expand to this:

makeC :: (forall r. Failure r -> S -> Success a r -> Result r)
      -> Get a
makeC f = Get (\fail i ni ra -> f fail (S i ni ra))

Could we not avoid a loss of sharing by just inserting a lot of intermediate bindings:

newtype Get a = Cw { runCont :: forall r.
                               Failure   r ->
                               B.ByteString ->
                               B.ByteString ->
                               Bool ->
                               Success a r ->
                               Result    r }
makeC :: (forall r. Failure r -> S -> Success a r -> Result r)
      -> Get a
makeC f = Get (\i ni ra -> f (S i ni ra))

makeC f = Get (\fail -> let ffail = f fail in \i ni ra -> ffail (S i ni ra))

The simplifier will inline the wrapper and clean up any intermediate bindings that can be proved to share no expensive work. This seems to solve the work sharing issue?

comment:36 Changed 3 years ago by igloo

  • Milestone changed from 7.2.1 to 7.4.1

comment:37 Changed 2 years ago by igloo

  • Milestone changed from 7.4.1 to 7.6.1
  • Priority changed from low to normal

comment:38 Changed 22 months ago by igloo

  • Milestone changed from 7.6.1 to 7.6.2

comment:39 Changed 18 months ago by liyang

  • Cc hackage.haskell.org@… added
Note: See TracTickets for help on using tickets.