GHC: Ticket Query
http://ghc.haskell.org/trac/ghc/query?status=!closed&reporter=dolio&order=priority
The Glasgow Haskell Compileren-USGHChttp://ghc.haskell.org/trac/ghc/chrome/site/ghc_logo.png
http://ghc.haskell.org/trac/ghc/query?status=!closed&reporter=dolio&order=priority
Trac 1.0.1
http://ghc.haskell.org/trac/ghc/ticket/8556
http://ghc.haskell.org/trac/ghc/ticket/8556#8556: Invalid constructor names are accepted in data declarationsFri, 22 Nov 2013 21:08:37 GMTdolio<p>
Earlier today, someone was asking on #haskell why the constructor name <tt>(^)</tt> wouldn't work in GADT definitions. My response was that <tt>(^)</tt> isn't a constructor name, but much to my surprise, GHC accepts such names in a regular data declaration:
</p>
<pre class="wiki">data Foo = F | (^^^^) Int Int
</pre><p>
This creates a <tt>Foo</tt> type and value constructor <tt>F</tt>, but no value constructor <tt>(^^^^)</tt>. However, in 7.6.3, if DataKinds are enabled, both constructors appear at the type level.
</p>
<p>
In HEAD, the same definition is accepted, with only <tt>F</tt> existing at the value level, as before. But at the type level, both <tt>F</tt> and <tt>(^^^^)</tt> just generate errors that <tt>Foo</tt> is not a promotable type. At that point, I think there's no question that the declaration should just be ruled out.
</p>
Resultshttp://ghc.haskell.org/trac/ghc/ticket/8556#changelog
http://ghc.haskell.org/trac/ghc/ticket/9320
http://ghc.haskell.org/trac/ghc/ticket/9320#9320: Inlining regression/strangeness in 7.8Wed, 16 Jul 2014 07:17:25 GMTdolio<p>
A couple days ago, it was reported to me that vector-algorithms had a significant performance regression (~20x) on GHC 7.8.2. The problem stems from a lack of inlining and specialization of some of the functions that were previously handled in 7.6 and earlier. The following is a reduced test case (the vector and primitive packages are required):
</p>
<pre class="wiki">module A (test) where
import Control.Monad.ST
import Control.Monad
import Control.Monad.Primitive
import Data.Vector.Generic.Mutable as U
test :: (PrimMonad m, MVector v a, Num a) => Int -> v (PrimState m) a -> m a
-- test :: (MVector v a, Num a) => Int -> v s a -> ST s a
test 0 v = liftM (+1) $ unsafeRead v 0
test n v = do
long1 v
test (n-1) v
{-# INLINABLE test #-}
long1, long2, long3, long4 :: (PrimMonad m, MVector v a) => v (PrimState m) a -> m ()
long1 v = long2 v >> long2 v >> long2 v >> long2 v
long2 v = long3 v >> long3 v >> long3 v >> long3 v
long3 v = long4 v >> long4 v >> long4 v >> long4 v
long4 v = unsafeRead v 0 >>= unsafeWrite v 0
{-# INLINE long1 #-}
{-# INLINE long2 #-}
{-# INLINE long3 #-}
{-# INLINE long4 #-}
</pre><pre class="wiki">module Main (main) where
import Control.Monad.ST
import Data.Vector.Unboxed.Mutable as U hiding (read)
import System.Environment
import Unsafe.Coerce
import GHC.Prim
import A
test0 :: Int -> MVector s Int -> ST s Int
test0 n v = test n v
{-# NOINLINE test0 #-}
test1' :: Int -> MVector Any Int -> ST Any Int
test1' n v = test n v
{-# NOINLINE test1 #-}
test1 :: Int -> MVector a Int -> ST a Int
test1 = unsafeCoerce test1'
main = getArgs >>= \(n:b:_) ->
print $ runST $ do
v <- new 1
write v 0 0
(if read b then test0 else test1) (read n) v
</pre><p>
Module <tt>A</tt> exports a single function, <tt>test</tt>. This function is engineered to be quite large, by inlining several other functions into it, and it is itself marked INLINABLE. Then the <tt>Main</tt> module uses this function in two different ways:
</p>
<ul><li><tt>test0</tt> uses <tt>test</tt> at a type that is compatible with <tt>runST</tt>
</li><li><tt>test1'</tt> uses <tt>test</tt> at a completely monomorphic type, which is then coerced to a <tt>runST</tt> compatible type in <tt>test1</tt>
</li></ul><p>
On 7.6 I believe (though have not checked) that there will be little or no performance difference between <tt>test0</tt> and <tt>test1</tt>. However, on 7.8.2 (and, I have been assured, 7.8.3), there is a massive speed pentalty for <tt>test0</tt>; about 70x on my machine. This seems to be due to no inining or specialization for its use of <tt>test</tt>, which can be seen from <tt>-ddump-simpl</tt>.
</p>
<p>
However, if one changes the type of <tt>test</tt> in <tt>A</tt> to be specific to <tt>ST s</tt> rather than using <tt>PrimMonad</tt>, there is no performance difference, even on 7.8.2. So, the choice to inline and specialize seems to hinge on the instantiation of all the class constraints to monomorphic types containing no variables, rather than just types that resolve all overloading. I myself did not notice this problem, because my benchmark suite uses <tt>IO</tt>, which is a concrete instantiation of the type, and doesn't exhibit this problem.
</p>
<p>
I have temporarily 'fixed' vector-algorithms by moving back to <tt>INLINE</tt> pragmas, but <tt>INLINABLE</tt> is actually preferable in that case, because it generates faster code than <tt>INLINE</tt> when the optimizations actually fire. My test case here does not illustrate that well, though.
</p>
<p>
Is it safe to assume that this was not an intentional change? It's a rather weird rule (to me) if it was.
</p>
Resultshttp://ghc.haskell.org/trac/ghc/ticket/9320#changelog
http://ghc.haskell.org/trac/ghc/ticket/9509
http://ghc.haskell.org/trac/ghc/ticket/9509#9509: No automatic specialization of inlinable imports in 7.8Sat, 23 Aug 2014 05:50:00 GMTdolio<p>
According to the GHC manual, any uses of an imported definition that is overloaded and INLINABLE will automatically cause a SPECIALIZE to be generated for those uses, if appropriate. However, this no longer appears to be the case in 7.8(.3). Here is a simple test case:
</p>
<div class="code"><pre><span class="kr">module</span> <span class="nn">A</span> <span class="p">(</span><span class="nf">foo</span><span class="p">)</span> <span class="kr">where</span>
<span class="kr">import</span> <span class="nn">Data.IORef</span>
<span class="nf">foo</span> <span class="ow">::</span> <span class="kt">Ord</span> a <span class="ow">=></span> a <span class="ow">-></span> <span class="kt">IO</span> a
<span class="nf">foo</span> x <span class="ow">=</span> newIORef x <span class="o">>>=</span> readIORef <span class="o">>>=</span> <span class="nf">\</span>y <span class="ow">-></span> <span class="kr">case</span> compare x y <span class="kr">of</span> <span class="kt">LT</span> <span class="ow">-></span> return x <span class="p">;</span> <span class="kr">_</span> <span class="ow">-></span> return y
<span class="cm">{-# INLINABLE foo #-}</span>
</pre></div><div class="code"><pre><span class="kr">module</span> <span class="nn">Main</span> <span class="p">(</span><span class="nf">main</span><span class="p">)</span> <span class="kr">where</span>
<span class="kr">import</span> <span class="nn">A</span>
<span class="nf">main</span> <span class="ow">=</span> foo <span class="p">(</span><span class="mi">5</span> <span class="ow">::</span> <span class="kt">Int</span><span class="p">)</span> <span class="o">>>=</span> print
</pre></div><p>
<tt>foo</tt> is constructed to be long enough that GHC 7.8.3 will elect to not inline it.
</p>
<p>
When compiling with 7.6.3, the core contains the following:
</p>
<pre class="wiki">Main.$sfoo1
:: GHC.Types.Int
-> GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld, GHC.Types.Int #)
[GblId,
Arity=2,
Caf=NoCafRefs,
Str=DmdType LL,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=2, Value=True,
ConLike=True, WorkFree=True, Expandable=True,
Guidance=IF_ARGS [20 0] 55 30}]
Main.$sfoo1 =
\ (x_XkE :: GHC.Types.Int)
(eta_B1 :: GHC.Prim.State# GHC.Prim.RealWorld) ->
case GHC.Prim.newMutVar#
@ GHC.Types.Int @ GHC.Prim.RealWorld x_XkE eta_B1
of _ { (# ipv_amT, ipv1_amU #) ->
case GHC.Prim.readMutVar#
@ GHC.Prim.RealWorld @ GHC.Types.Int ipv1_amU ipv_amT
of ds1_amJ { (# ipv2_Xn8, ipv3_Xna #) ->
case x_XkE of wild_axu { GHC.Types.I# x#_axw ->
case ipv3_Xna of _ { GHC.Types.I# y#_axA ->
case GHC.Prim.<# x#_axw y#_axA of _ {
GHC.Types.False -> ds1_amJ;
GHC.Types.True -> (# ipv2_Xn8, wild_axu #)
}
}
}
}
}
Main.$sfoo :: GHC.Types.Int -> GHC.Types.IO GHC.Types.Int
[GblId,
Arity=2,
Caf=NoCafRefs,
Str=DmdType LL,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=True,
ConLike=True, WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(unsat_ok=True,boring_ok=True)}]
Main.$sfoo =
Main.$sfoo1
`cast` (<GHC.Types.Int>
-> Sym <(GHC.Types.NTCo:IO <GHC.Types.Int>)>
:: (GHC.Types.Int
-> GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld, GHC.Types.Int #))
~#
(GHC.Types.Int -> GHC.Types.IO GHC.Types.Int))
...
------ Local rules for imported ids --------
"SPEC A.foo [GHC.Types.Int]" [ALWAYS]
forall ($dOrd_sxj :: GHC.Classes.Ord GHC.Types.Int).
A.foo @ GHC.Types.Int $dOrd_sxj
= Main.$sfoo
</pre><p>
However, in 7.8.3 and newer (I'm actually using 7.9.20140725 for this particular output, but 7.8.3 is similar), we see the following:
</p>
<pre class="wiki">Main.main1
:: GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld, () #)
[GblId,
Arity=1,
Str=DmdType <L,U>,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=1, Value=True,
ConLike=True, WorkFree=True, Expandable=True,
Guidance=IF_ARGS [0] 100 0}]
Main.main1 =
\ (s_aIb :: GHC.Prim.State# GHC.Prim.RealWorld) ->
case ((A.foo @ GHC.Types.Int GHC.Classes.$fOrdInt Main.main2)
`cast` (GHC.Types.NTCo:IO[0] <GHC.Types.Int>_R
:: GHC.Types.IO GHC.Types.Int
~R# (GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld, GHC.Types.Int #))))
s_aIb
of _ [Occ=Dead] { (# ipv_aIe, ipv1_aIf #) ->
GHC.IO.Handle.Text.hPutStr2
GHC.IO.Handle.FD.stdout
(GHC.Show.$fShowInt_$cshow ipv1_aIf)
GHC.Types.True
ipv_aIe
}
</pre><p>
There is no local rules section in 7.8.3. Putting a manual SPECIALIZE pragma in the main module generates comparable code to 7.6.3, so it does not appear to be a failure at that level. Rather, GHC seems to just not be generating the SPECIALIZE equivalent.
</p>
<p>
Presumably this is a regression, but even if not, the manual should be revised.
</p>
Resultshttp://ghc.haskell.org/trac/ghc/ticket/9509#changelog
http://ghc.haskell.org/trac/ghc/ticket/5041
http://ghc.haskell.org/trac/ghc/ticket/5041#5041: Incorrect Read deriving for MagicHash constructorsWed, 23 Mar 2011 01:37:34 GMTdolio<p>
Greetings,
</p>
<p>
A fellow in #haskell on freenode just discovered the following bug:
</p>
<pre class="wiki">{-# LANGUAGE MagicHash #-}
data Note = A# deriving (Show, Read)
</pre><p>
show works fine, but read doesn't:
</p>
<pre class="wiki">*Main> show A#
"A#"
*Main> read (show A#)
*** Exception: Prelude.read: no parse
</pre><p>
Probably not a surprising omission, and hopefully not difficult to fix.
</p>
Resultshttp://ghc.haskell.org/trac/ghc/ticket/5041#changelog
http://ghc.haskell.org/trac/ghc/ticket/8313
http://ghc.haskell.org/trac/ghc/ticket/8313#8313: Poor performance of higher-order functions with unboxingTue, 17 Sep 2013 01:39:39 GMTdolio<p>
I was testing out some code to see how GHC handled some unboxed higher-order functions, and was suprised by the results I found. Here is some sample code:
</p>
<pre class="wiki">{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE MagicHash #-}
import GHC.Exts
import System.Environment
rel# :: Int# -> Int# -> Int# -> Bool
rel# i# j# k# = (i# +# j# +# k#) ># 100000000#
rel :: Int -> Int -> Int -> Bool
rel (I# i#) (I# j#) (I# k#) = rel# i# j# k#
manual :: (Int# -> Int# -> Int# -> Bool) -> (Int, Int, Int)
manual r# = go 0# 0# 0#
where
go i# j# k# | r# i# j# k# = (I# i#, I# j#, I# k#)
| otherwise = go j# k# (i# +# 1#)
{-# NOINLINE manual #-}
auto :: (Int -> Int -> Int -> Bool) -> (Int, Int, Int)
auto r = go 0 0 0
where
go !i !j !k | r i j k = (i, j, k)
| otherwise = go j k (i+1)
{-# NOINLINE auto #-}
main = getArgs >>= \case
"manual" : _ -> print $ manual rel# -- This case is significantly slower.
"auto" : _ -> print $ auto rel -- Why?
</pre><p>
A loop that has to box its loop parameters to call a predicate turns out to be significantly faster than one that uses a predicate that takes unboxed values directly.
</p>
<p>
The answer turns out to be (I believe) in ghc/utils/genapply/GenApply.hs. applyTypes has an entry [P,P,P], but only [N]. This means that the manual loop has to use a slower calling convention than the boxed loop.
</p>
<p>
I'm not sure whether this should be 'fixed,' as my encounter with it was experimental in nature, and I don't have a real use case. The comment on applyTypes says its cases cover 99% of uses, and mine is not a real use. This ticket may serve as documentation at least, though.
</p>
Resultshttp://ghc.haskell.org/trac/ghc/ticket/8313#changelog
http://ghc.haskell.org/trac/ghc/ticket/2374
http://ghc.haskell.org/trac/ghc/ticket/2374#2374: MutableByteArray# is slower than Addr#Wed, 18 Jun 2008 06:55:58 GMTdolio<p>
I've filed this against the native code generator, even thought it still occurs in my experience with the via-C path. It is significantly more pronounced with the NCG path, and so, I imagine, is likely to be fixed by working on that. However, feel free to reclassify if appropriate.
</p>
<p>
When attempting to rewrite the fannkuch benchmark of the computer language benchmarks game to use more realistic code, I discovered that I am currently unable to do so. At least one piece of the puzzle is that the current entry uses Ptr and Addr# for arrays, while the current mutable array libraries (ST arrays and uvector) use MutableByteArray# underneath, and it happens that reads/writes on the latter are slower than the former.
</p>
<p>
After some discussion on the glasgow-haskell-users list, I believe I've worked the kinks out of my (even more micro) benchmark programs, and will attach them here. They accept two numbers, k and n, and reverse an array of size-n k times. The larger fannkuch benchmark works on small arrays (reversing, shifting and copying), so using small arrays with a very large number of iterations is more representative of it. However, my benchmark doesn't show terribly much variation when one varies the parameters by corresponding factors (although there is a drop in performance when the size of the array nears the size of the CPU cache, naturally).
</p>
<p>
I've test machine is an AMD Athlon 64 3000+ running in 64-bit mode with 2 GB of ram. My operating system is Linux, kernel 2.6.24.
</p>
<p>
I've mainly done runs where n * k = 2.5 billion, as that takes approximately as much time as the MutableByteArray# version of the fannkuch benchmark, around 11 seconds for 250 million iterations with an array size of 10. The Addr# version of the benchmark runs with the same parameters in around 7 seconds (which is faster than the actual fannkuch benchmark, but I digress).
</p>
<p>
I'll attach my two micro benchmarks. ByteArr.hs is the MutableByteArray# reversal benchmark. Ptr.hs is the Addr#/Ptr version. I may also attach full fannkuch implementations as another data point (the original is already Addr#/Ptr, and I've written a MutableByteArray# implementation; I just need to do some proofreading of the latter to make sure it doesn't have any discrepancies).
</p>
<p>
I've also not done much investigation as to the source of the problem, besides looking at the core of various higher level programs to track things down this far. I'll attempt to delve further into the issue, investigating the C-- and assembly if my abilities allow, and update here as I find things.
</p>
Resultshttp://ghc.haskell.org/trac/ghc/ticket/2374#changelog
http://ghc.haskell.org/trac/ghc/ticket/2387
http://ghc.haskell.org/trac/ghc/ticket/2387#2387: Optimizer misses unboxing opportunityThu, 19 Jun 2008 22:08:30 GMTdolio<p>
In my studying of the fannkuch benchmark, I've discovered (I think) another missed optimization. A scaled down illustration goes as follows:
</p>
<pre class="wiki">{-# LANGUAGE TypeOperators, BangPatterns #-}
module Main (main) where
import Control.Monad.ST
import System.Environment
data (:*:) a b = !a :*: !b
whileLoop :: Int -> ST s Int
whileLoop = go 0
where
go !n k
| k == 0 = return n
| otherwise = go (n+1) (k-1)
{-# INLINE whileLoop #-}
iter :: Int -> Int -> ST s (Bool :*: Int)
iter k n = do
k' <- whileLoop 40 >>= \k' -> return $! max k k'
b <- return (n == 0)
return $! b :*: k'
{-# INLINE iter #-}
mainLoop :: Int -> Int -> ST s Int
mainLoop k n = do
done :*: k' <- iter k n
if done
then return k'
else mainLoop k' (n - 1)
main = print =<< stToIO . mainLoop 0 . read . head =<< getArgs
</pre><p>
If we look at the core for whileLoop's worker, we see:
</p>
<pre class="wiki">$wpoly_go_r1aE :: forall s_aem.
Int# -> Int# -> STRep s_aem Int
$wpoly_go_r1aE =
\ (@ s_aem)
(ww_s18Y :: Int#)
(ww1_s192 :: Int#)
(eta_s19w :: State# s_aem) ->
case ww1_s192 of wild_XF {
__DEFAULT ->
$wpoly_go_r1aE @ s_aem (+# ww_s18Y 1) (-# wild_XF 1) eta_s19w;
0 -> (# eta_s19w, I# ww_s18Y #)
}
</pre><p>
Note, the return type is a boxed Int. The function is only used once, like so:
</p>
<pre class="wiki"> ...
case $wpoly_go_r1aE @ s_aem 0 40 w_s19f of wild_aFw { (# new_s_aFB, r_aFC #) ->
case r_aFC of wild1_aG9 { I# y_aGb ->
case <=# ww_s199 y_aGb of wild2_aGd {
False ->
...
</pre><p>
In other words, go boxes its results at the end of the loop, and the function that uses it immediately looks inside the box for the value. In this particular micro-benchmark, the boxed value (wild1_aG9 above) is actually used in the case where mainLoop returns the boxed value. However, in the larger benchmark I pulled this from, that is not the case in several areas (only the unboxed value is used, but it still goes through a box). Either way, the boxed value is only used at the end of the loop (and not every time there), and on every other iteration, this results in superfluous allocation.
</p>
<p>
I'll attach a manually unboxed version I wrote (it also has iter and max manually inlined to mainLoop, since that makes it easier to write; the core for the above shows that they are getting inlined properly, so I assume that isn't the issue). Using +RTS -sstderr shows that (running 100 million iterations here), the manually unboxed version allocates 50 kilobytes on the heap, and runs in around 15 seconds, whereas the version above that doesn't get unboxed does 1.6 gigabytes of heap allocation, and takes 18 seconds (in the larger benchmark, such extra boxing would happen perhaps 40 million times over the course of the program).
</p>
<p>
Thanks for your help.
</p>
Resultshttp://ghc.haskell.org/trac/ghc/ticket/2387#changelog