Opened 3 years ago

Last modified 4 months ago

#5539 new bug

GHC panic - Simplifier ticks exhausted

Reported by: hvr Owned by: simonpj
Priority: high Milestone: 7.6.2
Component: Compiler Version: 7.3
Keywords: Cc: johan.tibell@…, chowells79@…, bgamari@…, pho@…, rl, bos@…, alexey.skladnoy@…, choener@…, aruiz@…, idhameed@…, hackage.haskell.org@…, ekmett@…, gregmainland@…
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Compile-time crash Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

When trying to install blaze-builder with a freshly installed GHC 7.3.20111007 linux/amd64 build, the following panic occurs:

$ cabal install  blaze-builder
Resolving dependencies...
Configuring blaze-builder-0.3.0.1...
Building blaze-builder-0.3.0.1...
Preprocessing library blaze-builder-0.3.0.1...
...
[10 of 13] Compiling Blaze.ByteString.Builder.HTTP ( Blaze/ByteString/Builder/HTTP.hs, dist/build/Blaze/ByteString/Builder/HTTP.o )
[11 of 13] Compiling Blaze.ByteString.Builder.Int ( Blaze/ByteString/Builder/Int.hs, dist/build/Blaze/ByteString/Builder/Int.o )
ghc: panic! (the 'impossible' happened)
  (GHC version 7.3.20111007 for x86_64-unknown-linux):
	Simplifier ticks exhausted
    When trying UnfoldingDone a_s9oE{v} [lid]
    To increase the limit, use -fsimpl-tick-factor=N (default 100)
    If you need to do this, let GHC HQ know, and what factor you needed
    To see detailed counts use -ddump-simpl-stats
    Total ticks: 5123

Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

Attachments (5)

tick_panic_fpr.2 (2.3 KB) - added by daniel.is.fischer 2 years ago.
no_tick_panic_fpr (2.3 KB) - added by daniel.is.fischer 2 years ago.
tc095_comp_stderr.tar.gz (125.8 KB) - added by daniel.is.fischer 2 years ago.
first part of compilation output
tc095_comp_stderr_2.tar.gz (131.9 KB) - added by daniel.is.fischer 2 years ago.
second part of compilation output
ghc-7.3.20111030-vector-algorithms-0.5.3-dumpl-simpl-stats.log.gz (35.3 KB) - added by hvr 2 years ago.
output of -ddumpl-simpl-stats for compilation of vector-algorithms-0.5.3

Download all attachments as: .zip

Change History (59)

comment:1 Changed 3 years ago by simonpj

Thanks. Can you do as the error message asks, and see what limit does work?

Simon

comment:2 Changed 3 years ago by hvr

sure, cabal install --ghc-option=-fsimpl-tick-factor=130 blaze-builder was the smallest simpl-tick-factor value that worked for me...

comment:3 Changed 3 years ago by simonpj@…

commit 2a9f42095cb2d9aace991b11bf052d12ca654ef8

Author: Simon Peyton Jones <simonpj@microsoft.com>
Date:   Wed Oct 12 11:50:55 2011 +0100

    Increase max simplifier tick count magic number (Trac #5539)

 compiler/simplCore/SimplMonad.lhs |   15 +++++++++------
 1 files changed, 9 insertions(+), 6 deletions(-)

comment:4 Changed 3 years ago by simonpj

  • Resolution set to fixed
  • Status changed from new to closed

I've bumped up the limit; it is supposed to be generous. Thanks

Simon

comment:5 Changed 2 years ago by daniel.is.fischer

  • Resolution fixed deleted
  • Status changed from closed to new

ghc-7.3.20110121 fails tc095(optasm):

=====> tc095(optasm) 159 of 3047 [0, 1, 0]
cd ./typecheck/should_compile && '/home/dafis/GHC/bghc/bindisttest/install dir/bin/ghc' -fforce-recomp -dcore-lint -dcmm-lint -dno-debug-output -no-user-package-conf -rtsopts -fno-ghci-history  -c tc095.hs -O -fasm  -fno-warn-incomplete-patterns >tc095.comp.stderr 2>&1
Compile failed (status 256) errors were:
ghc: panic! (the 'impossible' happened)
  (GHC version 7.3.20111021 for x86_64-unknown-linux):
	Simplifier ticks exhausted
    When trying UnfoldingDone a_sYH{v} [lid]
    To increase the limit, use -fsimpl-tick-factor=N (default 100)
    If you need to do this, let GHC HQ know, and what factor you needed
    To see detailed counts use -ddump-simpl-stats
    Total ticks: 90320

Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug


*** unexpected failure for tc095(optasm)

No panic with ghc-7.3.20111019 (fingerprints attached).

Changed 2 years ago by daniel.is.fischer

Changed 2 years ago by daniel.is.fischer

comment:6 Changed 2 years ago by simonpj

  • Status changed from new to infoneeded

Well that is most odd. I cannot reproduce the failure with tc095. If it happens again, can you do -dverbose-core2core -ddump-simpl-stats -ddump-occur-anal?

Simon

comment:7 Changed 2 years ago by daniel.is.fischer

I cannot reproduce it with recent HEADs, so whatever it was seems fixed. I rebuilt 20111021 (didn't quite build with the fingerprint, I needed to use latest versions of ghc-prim and haddock, due to git/fingerprint not being able to restore the exact state of the repo when branches have been merged since) and that panics again. Produces a lot of output with the flags.

Changed 2 years ago by daniel.is.fischer

first part of compilation output

Changed 2 years ago by daniel.is.fischer

second part of compilation output

comment:8 Changed 2 years ago by simonpj

  • Resolution set to fixed
  • Status changed from infoneeded to closed

I think if it seems ok now I'll let sleeping dogs lie :-)

comment:9 Changed 2 years ago by hvr

fyi, I just tried the latest binary ghc build for linux/amd64, and got a Simplifier ticks exhausted error when cabal installing vector-algorithms-0.5.3:

$ cabal-dev install vector-algorithm

Building primitive-0.4.0.1...
...
Building vector-0.9...
...
Configuring vector-algorithms-0.5.3...
Building vector-algorithms-0.5.3...
Preprocessing library vector-algorithms-0.5.3...
[1 of 9] Compiling Data.Vector.Algorithms.Common ( Data/Vector/Algorithms/Common.hs, dist/build/Data/Vector/Algorithms/Common.o )
[2 of 9] Compiling Data.Vector.Algorithms.Radix ( Data/Vector/Algorithms/Radix.hs, dist/build/Data/Vector/Algorithms/Radix.o )
[3 of 9] Compiling Data.Vector.Algorithms.Search ( Data/Vector/Algorithms/Search.hs, dist/build/Data/Vector/Algorithms/Search.o )
[4 of 9] Compiling Data.Vector.Algorithms.Optimal ( Data/Vector/Algorithms/Optimal.hs, dist/build/Data/Vector/Algorithms/Optimal.o )
[5 of 9] Compiling Data.Vector.Algorithms.Insertion ( Data/Vector/Algorithms/Insertion.hs, dist/build/Data/Vector/Algorithms/Insertion.o )
[6 of 9] Compiling Data.Vector.Algorithms.AmericanFlag ( Data/Vector/Algorithms/AmericanFlag.hs, dist/build/Data/Vector/Algorithms/AmericanFlag.o )
[7 of 9] Compiling Data.Vector.Algorithms.Heap ( Data/Vector/Algorithms/Heap.hs, dist/build/Data/Vector/Algorithms/Heap.o )
[8 of 9] Compiling Data.Vector.Algorithms.Intro ( Data/Vector/Algorithms/Intro.hs, dist/build/Data/Vector/Algorithms/Intro.o )
ghc: panic! (the 'impossible' happened)
  (GHC version 7.3.20111030 for x86_64-unknown-linux):
	Simplifier ticks exhausted
    When trying UnfoldingDone ghc-prim:GHC.Classes.not{v r1d} [gid]
    To increase the limit, use -fsimpl-tick-factor=N (default 100)
    If you need to do this, let GHC HQ know, and what factor you needed
    To see detailed counts use -ddump-simpl-stats
    Total ticks: 42922

Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

cabal: Error: some packages failed to install:
vector-algorithms-0.5.3 failed during the building phase. The exception was:
ExitFailure 1

-fsimpl-tick-factor=167 was the minimum value required to let the compilation succeed

Changed 2 years ago by hvr

output of -ddumpl-simpl-stats for compilation of vector-algorithms-0.5.3

comment:10 Changed 2 years ago by hvr

  • Resolution fixed deleted
  • Status changed from closed to new

comment:11 Changed 2 years ago by tibbe

  • Cc johan.tibell@… added

comment:12 Changed 2 years ago by carlhowells

The statistics library is an additional case where this error pops up:

This is with -fsimpl-tick-factor=313

Preprocessing library statistics-0.9.0.0...
Building statistics-0.9.0.0...
[ 2 of 22] Compiling Statistics.Function ( Statistics/Function.hs, dist/build/Statistics/Function.o )
ghc: panic! (the 'impossible' happened)
  (GHC version 7.3.20111101 for x86_64-apple-darwin):
	Simplifier ticks exhausted
    When trying UnfoldingDone base:GHC.Base.plusInt{v rU} [gid]
    To increase the limit, use -fsimpl-tick-factor=N (default 100)
    If you need to do this, let GHC HQ know, and what factor you needed
    To see detailed counts use -ddump-simpl-stats
    Total ticks: 40738

At -fsimpl-tick-factor=314, I get this:

Preprocessing library statistics-0.9.0.0...
Building statistics-0.9.0.0...
[ 2 of 22] Compiling Statistics.Function ( Statistics/Function.hs, dist/build/Statistics/Function.o )
WARNING: file compiler/simplCore/SimplCore.lhs line 578
Simplifier baling out after 4 iterations [494, 253, 20, 2] Size = 39959
Interesting!  A join var that isn't let-no-escaped
    [$j{v sbHe} [lid]]

In the same library, another module:

At -fsimpl-tick-factor=291, I get this:

Preprocessing library statistics-0.9.0.0...
Building statistics-0.9.0.0...
[ 8 of 22] Compiling Statistics.Quantile ( Statistics/Quantile.hs, dist/build/Statistics/Quantile.o )
ghc: panic! (the 'impossible' happened)
  (GHC version 7.3.20111101 for x86_64-apple-darwin):
	Simplifier ticks exhausted
    When trying UnfoldingDone ghc-prim:GHC.Classes.||{v rx} [gid]
    To increase the limit, use -fsimpl-tick-factor=N (default 100)
    If you need to do this, let GHC HQ know, and what factor you needed
    To see detailed counts use -ddump-simpl-stats
    Total ticks: 66349

At -fsimpl-tick-factor=292, I get this:

Building statistics-0.9.0.0...
[ 8 of 22] Compiling Statistics.Quantile ( Statistics/Quantile.hs, dist/build/Statistics/Quantile.o )
WARNING: file compiler/simplCore/SimplCore.lhs line 578
Simplifier baling out after 4 iterations [66581, 5389, 242, 7] Size = 101356
WARNING: file compiler/simplCore/SetLevels.lhs line 1022
absVarsOf: discarding info on j_s8zy
WARNING: file compiler/simplCore/SetLevels.lhs line 1022
absVarsOf: discarding info on j_s8yO

comment:13 Changed 2 years ago by carlhowells

  • Cc chowells79@… added

Posting again because of how crazy this first case is...

In criterion:

At -fsimpl-tick-factor=614, I get this:

Building criterion-0.5.1.0...
[ 9 of 12] Compiling Criterion.Analysis ( Criterion/Analysis.hs, dist/build/Criterion/Analysis.o )
ghc: panic! (the 'impossible' happened)
  (GHC version 7.3.20111101 for x86_64-apple-darwin):
	Simplifier ticks exhausted
    When trying UnfoldingDone base:GHC.Base.$fMonadIO_$c>>={v rvx} [gid]
    To increase the limit, use -fsimpl-tick-factor=N (default 100)
    If you need to do this, let GHC HQ know, and what factor you needed
    To see detailed counts use -ddump-simpl-stats
    Total ticks: 203602

At -fsimpl-tick-factor=615, I get this:

Building criterion-0.5.1.0...
[ 9 of 12] Compiling Criterion.Analysis ( Criterion/Analysis.hs, dist/build/Criterion/Analysis.o )
WARNING: file compiler/simplCore/SimplCore.lhs line 578
Simplifier baling out after 4 iterations [203871, 45952, 981, 5] Size = 92928
WARNING: file compiler/simplCore/SetLevels.lhs line 1022
absVarsOf: discarding info on j_a5Zj
WARNING: file compiler/simplCore/SetLevels.lhs line 1022
absVarsOf: discarding info on n_icU3
WARNING: file compiler/simplCore/SetLevels.lhs line 1022
absVarsOf: discarding info on n_XcV0
WARNING: file compiler/simplCore/SetLevels.lhs line 1022
absVarsOf: discarding info on j_a5Zj
WARNING: file compiler/simplCore/SetLevels.lhs line 1022
absVarsOf: discarding info on n_icU3
WARNING: file compiler/simplCore/SetLevels.lhs line 1022
absVarsOf: discarding info on n_XcV2
WARNING: file compiler/simplCore/SimplCore.lhs line 578
Simplifier baling out after 4 iterations [2261, 32, 16, 10] Size = 144862
WARNING: file compiler/stgSyn/CoreToStg.lhs line 222
lvl7_rjav True False
Interesting!  A join var that isn't let-no-escaped
    [$j{v spjP} [lid]]
Interesting!  A join var that isn't let-no-escaped
    [$j{v sp6S} [lid]]
Interesting!  A join var that isn't let-no-escaped
    [$j{v sjFG} [lid]]

And, a couple modules later:

At -fsimpl-tick-factor=122, I get this:

Preprocessing library criterion-0.5.1.0...
Building criterion-0.5.1.0...
[11 of 12] Compiling Criterion        ( Criterion.hs, dist/build/Criterion.o )
WARNING: file compiler/simplCore/SimplCore.lhs line 578
Simplifier baling out after 4 iterations [2752, 252, 55, 60] Size = 2470
WARNING: file compiler/simplCore/SimplCore.lhs line 578
Simplifier baling out after 4 iterations [1166, 95, 21, 5] Size = 2529
ghc: panic! (the 'impossible' happened)
  (GHC version 7.3.20111101 for x86_64-apple-darwin):
	Simplifier ticks exhausted
    When trying RuleFired Class op basicLength
    To increase the limit, use -fsimpl-tick-factor=N (default 100)
    If you need to do this, let GHC HQ know, and what factor you needed
    To see detailed counts use -ddump-simpl-stats
    Total ticks: 123423

At -fsimpl-tick-factor=123, I get this:

Building criterion-0.5.1.0...
[11 of 12] Compiling Criterion        ( Criterion.hs, dist/build/Criterion.o )
WARNING: file compiler/simplCore/SimplCore.lhs line 578
Simplifier baling out after 4 iterations [2752, 252, 55, 60] Size = 2470
WARNING: file compiler/simplCore/SimplCore.lhs line 578
Simplifier baling out after 4 iterations [1166, 95, 21, 5] Size = 2529
WARNING: file compiler/simplCore/SetLevels.lhs line 1022
absVarsOf: discarding info on n_i3Yc
WARNING: file compiler/simplCore/SetLevels.lhs line 1022
absVarsOf: discarding info on j_i5Zx
WARNING: file compiler/simplCore/SetLevels.lhs line 1022
absVarsOf: discarding info on n_ikeB
WARNING: file compiler/simplCore/SetLevels.lhs line 1022
absVarsOf: discarding info on n_Xkgj
WARNING: file compiler/simplCore/SetLevels.lhs line 1022
absVarsOf: discarding info on j_ibHU
WARNING: file compiler/simplCore/SetLevels.lhs line 1022
absVarsOf: discarding info on n_ikeB
WARNING: file compiler/simplCore/SetLevels.lhs line 1022
absVarsOf: discarding info on n_Xkgl
Interesting!  A join var that isn't let-no-escaped
    [$j{v sngJ} [lid]]

comment:14 Changed 2 years ago by bgamari

  • Cc bgamari@… added

I also see this with vector-algorithms on today's master,

[8 of 9] Compiling Data.Vector.Algorithms.Intro ( Data/Vector/Algorithms/Intro.hs, dist/build/Data/Vector/Algorithms/Intro.o )
ghc: panic! (the 'impossible' happened)
  (GHC version 7.3.20111107 for x86_64-unknown-linux):
	Simplifier ticks exhausted
    When trying UnfoldingDone ghc-prim:GHC.Classes.||{v r1c} [gid]
    To increase the limit, use -fsimpl-tick-factor=N (default 100)
    If you need to do this, let GHC HQ know, and what factor you needed
    To see detailed counts use -ddump-simpl-stats
    Total ticks: 46921

Things work with -fsimpl-tick-factor=168, however.

comment:15 Changed 2 years ago by simonpj

Thanks. I'm puzzled about why there are so many ticks needed, so I'm not going to increase the baseline factor yet. Having concrete examples is really helpful, so keep adding them if you come across new ones.

Simon

comment:16 Changed 2 years ago by igloo

  • Milestone set to 7.4.1
  • Owner set to simonpj

comment:17 Changed 2 years ago by hvr

Jfyi, this issue still persists with ghc-7.3.20111130

comment:18 Changed 2 years ago by tibbe

Could we please bump the priority of this ticket? Large swats of packages don't build due to vector not building.

comment:19 Changed 2 years ago by tibbe

See also #5703

comment:20 Changed 2 years ago by simonmar

  • Difficulty set to Unknown
  • Priority changed from normal to high

comment:21 Changed 2 years ago by hvr

Here's an example which needs a rather high -fsimpl-tick-factor to compile:

$ git clone https://github.com/bos/pronk && cd pronk/

$ cabal build --ghc-option=-fsimpl-tick-factor=1500
Building pronk-0.1.0...
Preprocessing library pronk-0.1.0...
[1 of 6] Compiling Paths_pronk      ( dist/build/autogen/Paths_pronk.hs, dist/build/Paths_pronk.o )
[2 of 6] Compiling Network.HTTP.LoadTest.Environment ( lib/Network/HTTP/LoadTest/Environment.hs, dist/build/Network/HTTP/LoadTest/Environment.o )
[3 of 6] Compiling Network.HTTP.LoadTest.Types ( lib/Network/HTTP/LoadTest/Types.hs, dist/build/Network/HTTP/LoadTest/Types.o )
[4 of 6] Compiling Network.HTTP.LoadTest.Analysis ( lib/Network/HTTP/LoadTest/Analysis.hs, dist/build/Network/HTTP/LoadTest/Analysis.o )
ghc: panic! (the 'impossible' happened)
  (GHC version 7.4.0.20111215 for x86_64-unknown-linux):
	Simplifier ticks exhausted
    When trying UnfoldingDone ( base:GHC.ST.ST{v r4bE} [gid[DataConWrapper]] :: forall ( s{tv a4ef} [tv] :: ghc-prim:GHC.Prim.*{(w) tc 34d} )
                                                                                       ( a{tv a4eg} [tv] :: ghc-prim:GHC.Prim.*{(w) tc 34d} ).
                                                                                base:GHC.ST.STRep{tc rdm}
                                                                                  ( s{tv a4ef} [tv] :: ghc-prim:GHC.Prim.*{(w) tc 34d} )
                                                                                  ( a{tv a4eg} [tv] :: ghc-prim:GHC.Prim.*{(w) tc 34d} )
                                                                                -> base:GHC.ST.ST{tc rdp}
                                                                                     ( s{tv a4ef} [tv] :: ghc-prim:GHC.Prim.*{(w) tc 34d} )
                                                                                     ( a{tv a4eg} [tv] :: ghc-prim:GHC.Prim.*{(w) tc 34d} ) )
    To increase the limit, use -fsimpl-tick-factor=N (default 100)
    If you need to do this, let GHC HQ know, and what factor you needed
    To see detailed counts use -ddump-simpl-stats
    Total ticks: 420601

Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

When using -fsimpl-tick-factor=1550 the compilation works

comment:22 Changed 2 years ago by hvr

btw, there seems to be some correlation with libraries making aggressive use of {-# INLINE ... #-} when using functions from Data.Vector.Algorithms.Intro (which seems to be involved in the cases where packages would require the -fsimpl-tick-factor`-workaround)

Btw, removing all {-# INLINE ... #-} pragmas from Data.Vector.Algorithms.Intro reduces the ticks required below the threshold. More specifically, removing the introsort inline annotation already reduces the ticks required below threshold:

  • Data/Vector/Algorithms/Intro.

    old new  
    8484-- Internal version of the introsort loop which allows partial 
    8585-- sort functions to call with a specified bound on iterations. 
    8686introsort :: (PrimMonad m, MVector v e) 
    8787          => Comparison e -> v (PrimState m) e -> Int -> Int -> Int -> m () 
    8888introsort cmp a i l u = sort i l u >> I.sortByBounds cmp a l u 
    8989 where 
    9090 sort 0 l u = H.sortByBounds cmp a l u 
    9191 sort d l u 
    9292   | len < threshold = return () 
    9393   | otherwise = do O.sort3ByIndex cmp a c l (u-1) -- sort the median into the lowest position 
    9494                    p <- unsafeRead a l 
    9595                    mid <- partitionBy cmp a p (l+1) u 
    9696                    unsafeSwap a l (mid - 1) 
    9797                    sort (d-1) mid u 
    9898                    sort (d-1) l   (mid - 1) 
    9999  where 
    100100  len = u - l 
    101101  c   = (u + l) `div` 2 
    102 {-# INLINE introsort #-} 

comment:23 follow-up: Changed 2 years ago by ezyang

If someone has the free time to generate a minimal test case, that would probably be *very* useful. The less files, the better. The smaller, the better.

comment:24 in reply to: ↑ 23 Changed 2 years ago by hvr

Replying to ezyang:

If someone has the free time to generate a minimal test case, that would probably be *very* useful. The less files, the better. The smaller, the better.

Does the following test-case help?

{-# LANGUAGE CPP #-}

module Bug5539 where

bar0 v = undefined
{-# INLINE bar0 #-}

bar1 v = v
{-# INLINE bar1 #-}

bar2 v = v + v
{-# INLINE bar2 #-}

bar3 v = v + v + v
{-# INLINE bar3 #-}

barn n v | n == 0     = bar0 v
         | n == 1     = bar1 v
         | n == 2     = bar2 v
         | n == 3     = bar3 v
         | otherwise  = v + barn (n-1) v
{-# INLINE barn #-}

foo 0 v = bar0 v
foo 1 v = bar1 v
foo 2 v = bar2 v
foo 3 v = bar3 v
foo n v = barn n v
{-# INLINE foo #-}

#define GEN2(a,b)             \
a 0 v  = b 0 v + b 1 v      ; \
a 1 v  = b 1 v + b 2 v      ; \
a 2 v  = b 2 v + b 3 v      ; \
a 3 v  = b 3 v + b 4 v      ; \
a 4 v  = b 4 v + b 5 v      ; \
a n v  = b n v + b (n+1) v  ; \
{-# INLINE a #-}            ; \


GEN2(baz0, foo)
GEN2(baz1, baz0)
GEN2(baz2, baz1)
GEN2(baz3, baz2)

-- EOF
$ ghc-7.4.0.20111215 -fforce-recomp -O Bug5539.hs 
[1 of 1] Compiling Bug5539          ( Bug5539.hs, Bug5539.o )
ghc: panic! (the 'impossible' happened)
  (GHC version 7.4.0.20111215 for x86_64-unknown-linux):
	Simplifier ticks exhausted
    When trying UnfoldingDone ( main:Bug5539.bar3{v r9P} [lidx] :: forall ( a{tv ahs} [sk] :: ghc-prim:GHC.Prim.*{(w) tc 34d} ).
                                                                   base:GHC.Num.Num{tc 2b}
                                                                     ( a{tv ahs} [sk] :: ghc-prim:GHC.Prim.*{(w) tc 34d} ) =>
                                                                   ( a{tv ahs} [sk] :: ghc-prim:GHC.Prim.*{(w) tc 34d} )
                                                                   -> ( a{tv ahs} [sk] :: ghc-prim:GHC.Prim.*{(w) tc 34d} ) )
    To increase the limit, use -fsimpl-tick-factor=N (default 100)
    If you need to do this, let GHC HQ know, and what factor you needed
    To see detailed counts use -ddump-simpl-stats
    Total ticks: 46322

comment:25 Changed 2 years ago by PHO

  • Cc pho@… added

comment:26 Changed 2 years ago by igloo

This is now a warning in the 7.4 branch:

commit 92c4406542235bd9930dbdbe265c22925d5d6889

Author: Ian Lynagh <igloo@earth.li>
Date:   Mon Dec 19 14:09:14 2011 +0000

    Make "Simplifier ticks exhausted" a warning in the 7.4 branch

    This works around the problems reported in #5539, where lots of people
    are running into the limit.

comment:27 Changed 2 years ago by simonpj

Dear hvr: thank you, but I'm afraid I don't think your test case helps. You have set up a situation in which

  • baz0 makes 12 calls to foo.
  • baz1 makes 12 calls to baz0, and hence 12*12 calls to foo.
  • baz2 makes 12 calls to baz1, and hence 12*12*12 calls to foo.

and so on. So you are specifying exponential program size in your INLINE pragmas, and GHC obediently obeys you.

I don't think this is what is happening in the other cases.

But maybe it is... the common factor seems to be vector-algorithms. Maybe that has a crazy-large inlining in it.

A repro case would be welcome. Meanwhile I'll try to follow up myself too.

comment:28 Changed 2 years ago by simonpj

  • Cc rl added

I have gotten as far as compiling vector-algorithms, and indeed I get

[8 of 9] Compiling Data.Vector.Algorithms.Intro ( Data/Vector/Algorithms/Intro.hs, dist/build/Data/Vector/Algorithms/Intro.o )
ghc: panic! (the 'impossible' happened)
  (GHC version 7.3.20111030 for x86_64-unknown-linux):
	Simplifier ticks exhausted
    When trying UnfoldingDone ghc-prim:GHC.Classes.not{v r1d} [gid]
    To increase the limit, use -fsimpl-tick-factor=N (default 100)
    If you need to do this, let GHC HQ know, and what factor you needed
    To see detailed counts use -ddump-simpl-stats
    Total ticks: 42922

So what is going on? To take one example, there's a call of Data.Vector.Algorithms.Optimal.sort3ByIndex, and that inlines a giant function. Why giant? Well, sort3ByIndex has three calls of

UNSAFE_CHECK(checkIndex) "sort3ByIndex" i (length a)

Moreover, sort3ByIndex calls the INLINE function Data.Vector.Generic.Mutable.unsafeRead 3 times, and unsafeWrite 12 times. Each of those functions has another use of UNSAFE_CHECK. This UNSAFE_CHECK thing is a CPP macro which expands thus

UNSAFE_CHECK(checkIndex) "sort3ByIndex" i (length a) x

--> (by cpp)
checkIndex "Foo.hs" 45 Unsafe "sort3ByIndex" i (length a) x

--> (INLINE checkIndex)
let n = length a in
check "Foo.hs" 45 Unsafe "sort3ByIndex" (checkIndex_msg i n) (i >= 0 && i < n) x

--> (INLINE check)
let n = length a in
if (not (doChecks Unsafe) || (i >= 0 && i < n))
then x
else error "Foo.hs" 45 Unsafe "sort3ByIndex" (checkIndex_msg i n)

Now imagine all of that inlined 18 times, and you get a lot of code. And that's only sort3ByIndex, which itself has an INLINE pragma.

Let's just focus on the number of calls to (||). In the module Intro.hs:

  • sortByBounds calls sort2ByOffset, sort3ByOffset, and sort4ByOffset (and introsort which I'll temporarily ignore)
  • Those three calls are inlined, resulting in 110 calls to (||).
  • But sortByBounds is itself an INLINE function, and is inlined into sortBy; result = 110 more calls to (||).
  • And sortBy is INLINE, and is inlined into sort, giving another 110 calls; total so far 330 calls.
  • Along with those 330 calls to (||) are 330 calls to not. All those are finally inlined independently and simplified away, but it uses up ticks!
  • Moreover all these sort3ByIndex functions work in an arbitrary monad, so it's all stitched together will calls to the overloaded (>>) and dictionary-passing.

And I did all this measurement with Intro.introsort commented out; it does more crazy stuff. I think this is why the code is so bloated.

Now, in fact vector is compiled in such a way that (doChecks Unsafe) evaluates to False. So in the bloated unfolding for UNSAFE_CHECK we see this:

let n = length a in
case (not False || (i >= 0 && i < n)) of
  True  -> a
  False -> error "Foo.hs" 45 Unsafe "sort3ByIndex" (checkIndex_msg i n)

Look at that (not False)! If GHC had been clever enough to inline not and || in the unfolding of sort3ByIndex, all this code would have collapsed away as dead code -- and indeed it does so in the eventual client module.

It's hard for GHC to know how clever to be.

  • The basic idea of an INLINE pragma on a definition f x = <expr>, then we'll inline precisely <expr> at every call site. GHC is faithfully doing this.
  • However GHC does some gentle simplification on <expr> before stowing it away ready for inlining:
    • It does simple case-of-constructor stuff
    • It inlines functions that are themselves marked INLINE, or calls that use the inline pseudo-function.

The current defn of Data.Vector.Internal.check is

check file line kind loc msg cond x
  | not (doChecks kind) || cond = x
  | otherwise = error file line kind loc msg

Here are two ways of making the not and (||) inline even inside check's inlining. First, do it by hand:

check file line kind loc msg cond x
  = case doChecks kind of
       False            -> x
       True | cond      -> x
            | otherwise -> error file line kind loc msg

Second, use inline:

check file line kind loc msg cond x
  | inline (||) (inline not (doChecks kind)) cond = x
  | otherwise = error file line kind loc msg

Then the size of Intro is more reasonable, and compiles within the current tick bounds without problem.

I have taken the trouble to write this up so that everyone gets a clearer idea of how inlining works.

I will make the above change to check in package vector (at least in GHC's tree). BUT if you change the macros so that full bounds-checking is on, all the bloat will return. So that really needs attention from Roman. For example, when bounds checking is on, the indexing fuctions should probably not inline.

comment:29 Changed 2 years ago by bos

  • Cc bos@… added

That Intro module has for a long time taken an exceptionally long time to compile, and (because it's inlined) so does every module that uses it by transitive closure. It's not unusual for a small package that depends on statistics (which uses vector and vector-algorithms) to take 20+ minutes to build on a fast machine if I tweak the upstream statistics package to trigger a rebuild.

Simon or Roman, do you suppose there are other good rules of thumb that people who rely on the inliner for performance could be using to try to avoid these explosions in compilation time?

comment:30 Changed 2 years ago by rl

Thanks for the explanation, Simon. I (once again) forgot that GHC will only inline functions that are explicitly marked INLINE into unfoldings. IMO, if there is such a significant difference between INLINE functions and very small functions like not which we basically expect to always be inlined, then all those functions should be marked as INLINE in the libraries. If would be even better if there was no difference, though!

I also don't understand why GHC should just stop simplifying at some point. Yes, it might take a while but it *will* finish. I'm don't really understand the motivation behind this simplifier ticks business. Could we perhaps have a way to turn it off just like we can turn off SpecConstr thresholds?

In the end, it seems that I'll have to switch to using the preprocessor rather than relying on GHC to eliminate if False which really is a pity. In C, I can write if(0) and a reasonable compiler will eliminate the code without any problems.

Note that those are only internal checks so they would never be turned on in end-user code. However, there is also proper bounds checking which might lead to similar problems in user-defined loops. Here, not inlining indexing functions isn't an option as it will to huge slowdowns. I'm not sure what to do there except for recommending to users to avoid large loops with explicit indexing. This really seems extremely unfortunate.

comment:31 Changed 2 years ago by simonpj

Bryan: I tried to give the rule above:

  • If you give an INLINE pragma on a definition f x = <expr>, then GHC inlines precisely <expr> at every call site.

So GHC simply treats INLINE very much like cpp treats a macro: it simply inlines the code on the RHS. If you inline a big thing, many times, you'll get a big thing. That seems reasonable to me. As you say, any reasonable compiler will eliminate dead code, and GHC certainly does so, but the intermediate happens to be big in this case.

Roman, it's hard for GHC to anticipate exactly what you want. Better IMHO to provide simple rules, and give you the hooks to say what you want. Why did I introduce it? Because it provides an excellent indication that something is going wrong, as indeed it has here. Maybe there should be a way to switch it off. But you can just sent the factor to a large number instead.

In this case a simple patch to vector suffices (I gave two versions). Would you like to apply it to vector?

If you really want the bounds checks, you are going to pay with lots of tests. For that case, making the test out-of-line makes a lot of sense.

comment:32 Changed 2 years ago by rl

I completely agree about having simple rules. However, the rule about what gets inlined into unfoldings is, IMO, not simple as this example shows. I've argued before that if a function becomes small enough to inline in any phase (rather than just at the very end) GHC should just freeze its unfolding and mark it as INLINABLE in that phase. This is essentially what happens now within a single module but not across modules. Then, small functions like not or (||) would always be INLINABLE. GHC would be able to inline them into unfoldings (I'm not sure if it would do so now but there doesn't seem to be a reason not to). This would also remove the (often rather surprising) difference in inlining behaviour depending on whether things are defined in the same module or in different ones.

Alternatively, we should go through the standard libraries and mark small functions such as not as INLINE or INLINABLE. I agree that GHC allows us to say what we want but the libraries often don't do so. Do we want not to be inlined into unfoldings? Most probably, as this example shows. But then we should say so!

I will fix vector, the change has to be made in the upstream repo, not in GHC's local version.

I can't make the bounds check out-of-line because that absolutely kills optimisations and would make the cost of bounds checking far too high. Just as an example, here is a small function:

f :: Vector Int -> Int
f xs = (xs ! 0) + (xs ! 1)

Here is what GHC generates for this now:

$wf :: Vector Int -> Int#
$wf = \xs -> case xs `cast` ... of { Vector a b c ->
             case 0 #< b of { False -> <error>
                              True ->
             case 1 #< b of { False -> <error>
                              True ->
             indexIntArray# c a +# indexIntArray# c (a +# 1)
             } } }

Here is what it would generate if (!) wasn't inlined:

$wf = \xs -> case xs ! 0 of { I# x ->
             case ys ! 1 of { I# y ->
             x +# y } }

That would be really bad. It might be possible to improve this but at the cost of significant increase in vector's code complexity which I'd rather avoid.

comment:33 Changed 2 years ago by rl

Simon, I took a closer looks at the the unfoldings generated for various vector modules and it seems that the rule you describe above isn't applied consistently. Let's take Data.Vector.Generic as an example. In simple functions like unsafeIndex, both not and (||) seem to be inlined into the unfolding and the dead bounds checking code (UNSAFE_CHECK in this case) is completely eliminated. But in more complex functions like unsafeBackpermute which uses the exact same UNSAFE_CHECK, this inlining doesn't happen, as you say. So it seems that the unfolding of unsafeIndex is optimised too aggressively?

comment:34 Changed 2 years ago by rl

I changed things around in vector and now vector-algorithms compiles (I'll release a new version of vector shortly). This little program, however, still produces the simplifier ticks exhausted message after 0.4s for me:

import qualified Data.Vector.Algorithms.Intro as I
import qualified Data.Vector.Generic as G

partialSort :: (G.Vector v e, Ord e) => Int -> v e -> v e
partialSort k = G.modify (\a -> I.partialSort a k)

This is taken straight from the statistics package which still doesn't compile for me because of this. I don't think I can do anything else in vector, perhaps vector-algorithms can be changed somehow to make GHC happy. The unfolding for partialSort is enormous. However, I'm not convinced this isn't GHC's fault, it generates a 400KB interface file from a 7KB source file by duplicating vast amounts of code in the unfoldings. This seems wrong. BTW, increasing the tick count isn't really a solution because it would have to be done in every package that uses partialSort rather than in vector-algorithms.

In any case, the bottom line is that we still have broken packages which were working perfectly fine before. I propose to turn the simplifier ticks error into a warning in the head and to turn it off completely in 7.4.1 until this is resolved.

comment:35 Changed 2 years ago by simonpj

Well, I'm not sure it's fair to say theywere "working perfectly fine before". After all, the warning has unconvered the fact that vector was generating giant intermediates, and could readily be fixed not to do so, as you have done. That's good news.

Moreover, I believe that it is a warning now, and in 7.4.

I will check the behaviour of unsafeIndex.

If you think GHC is mis-behaving on partialSort it would be great if you could narrow it down a bit (as you are so good at doing).

comment:36 Changed 2 years ago by rl

The statistics failure can be cured by this simple patch to Data/Vector/Algorithms/Intro.hs in vector-algorithms:

159,160d158
<  {-# INLINE [1] isort #-}
<  isort = introsort cmp a
170c168
<                       GT -> do isort (n-1) l (mid - 1)
---
>                       GT -> do introsort cmp a (n-1) l (mid - 1)
172c170
<                       EQ -> isort (n-1) l m
---
>                       EQ -> introsort cmp a (n-1) l m

This prevents introsort from being inlined into the unfolding. I'll send Dan a link to this thread, I don't think he is cc'd.

In fact, isort should probably be NOINLINE since there is no benefit to inlining introsort twice. But NOINLINE will also prevent worker/wrapper and I'm not sure what the performance impact of that would be.

I think we need a mechanism which lets us say that a function should always be specialised on certain dictionaries and arguments but not necessarily inlined. For example, introsort has this signature:

introsort :: (PrimMonad m, MVector v e) => Comparison e -> v (PrimState m) e -> Int -> Int -> Int -> m ()

We want to automatically specialise it for the PrimMonad, MVectror and Comparison arguments in all clients and then treat it like a normal function. This is quite similar to INLINABLE but we want the specialisation to happen automatically. Just like C++ templates, in fact. Unless I'm mistaken, INLINE is the only way to achieve this at the moment but that duplicates too much code when we want to specialise large algorithms.

comment:37 Changed 2 years ago by simonpj

Actually INLINALBE things already are auto-specialised in calling modules. Moreover, the RULE for that specialiation is exported so that importing modules can take advantage of it rather than create their own specialisations.
Example:

module T5539a where
  {-# INLINABLE g #-}
  g :: (Eq a, Num a) => a -> a
  g 0 = 0
  g n = 1 + g (n-1)

------------------
module T5536 where
  import T5539a

  f :: Int -> Int
  f x = g x + 1

Mouule T5539 automatically creates a version of g specialised at Int, and exports the RULE

------ Local rules for imported ids --------
"SPEC T5539a.g [GHC.Types.Int]" [ALWAYS]
    forall ($dEq_shF :: GHC.Classes.Eq GHC.Types.Int)
           ($dNum_shE :: GHC.Num.Num GHC.Types.Int).
      T5539a.g @ GHC.Types.Int $dEq_shF $dNum_shE
      = T5536.$sg

Is that what you want?

Simon

comment:38 Changed 2 years ago by rl

Oh, that's nice, I didn't know that. The docs say something quite different.

Alas, I suspect it's not enough for vector-algorithms (although it's Dan's code so he will know for sure). The problem is that, to use a simple example, INLINABLE would work for sort but not for sortBy. The latter would have to be specialised based on its first argument rather than on a dictionary. And, of course, it wouldn't work for sort if that latter was defined as sortBy compare. But I believe that this is essentially what vector-algorithms (quite sensibly) does throughout the library. So it isn't really clear to me how the current version of INLINABLE could be used here without compromising generality.

comment:39 Changed 2 years ago by dolio

| Alas, I suspect it's not enough for vector-algorithms (although it's Dan's code so he will know for sure).

I haven't done extensive testing, but I think I fiddled a bit a while back, and got the impression it wasn't.

And I suspect the reason is exactly what you mention. Specialization to the comparison function is huge, because it's in every inner loop. Inlining the whole sort should allow the comparison to be turned into an unpacked version.

Making sort INLINABLE might work, if that specializes to the dictionary, resulting in a statically known comparison that can be unboxed. But I suspect sortBy needs to be INLINE unless there's a good way to specialize to arbitrary statically known arguments.

Speaking more generally, the performance of vector-algorithms tends to rely a lot on getting GHC to optimize all my code down to using stuff like Int# (Word#, ...) everywhere. When most of the code was written, the only reliable way I could find to do this was inlining everything. Even splitting out auxiliary functions and not inlining them tended to result in superfluous boxing, which kills performance.

I'll do some experiments with INLINABLE to see if I can get it to generate specialized verisons without code bloat. But I suspect a lot of cases will see the same problems as sortBy: most of the core logic needs to be specialized to particular parameters (comparisons, radix functions, etc.), and dictionaries are only used as a convenient wrapper.

comment:40 Changed 2 years ago by dolio

Apologies. The beginning of my comment was meant to refer to:

"Alas, I suspect it's not enough for vector-algorithms (although it's Dan's code so he will know for sure)."

But then I deleted the quote without revising, making my statements a bit cryptic.

comment:41 Changed 2 years ago by rl

Right, so ideally we would like to specialise on value arguments as well. There is even a ticket for it: #5059. However, specialising on function arguments is very tricky because they can change so much during optimisation. So it isn't clear if this could be made to work at all.

To clarify, one problem that leads to code explosion in vector-algorithms is essentially this:

introsort f x = <large rhs>

partialSortBy f x = ... introsort f y ... introsort f z ...

When the user calls partialSortBy f for a particular function f, we want to specialise introsort for that f but we don't want to duplicate introsort's rhs. We also want worker/wrapper etc. to happen for the specialised introsort. How can we do that?

comment:42 Changed 2 years ago by igloo

  • Milestone changed from 7.4.1 to 7.6.1

comment:43 Changed 2 years ago by Khudyakov

  • Cc alexey.skladnoy@… added

comment:44 Changed 2 years ago by choenerzs

  • Cc choener@… added

comment:45 Changed 2 years ago by pcapriotti

Here is a simple example (by kosmikus) which blows up the simplifier without any INLINE pragma:

module TestCase where

import Control.Applicative

data X = X
  (Maybe String)
  (Maybe String)
  (Maybe String)
  (Maybe String)
  (Maybe String)
  (Maybe String)
  (Maybe String)
  (Maybe String)
  (Maybe String)

mb :: (String -> Maybe a) -> String -> Maybe (Maybe a)
mb _ ""  = Just Nothing
mb _ "-" = Just Nothing
mb p xs  = Just <$> p xs

run :: [String] -> Maybe X
run
  [ x1
  , x2
  , x3
  , x4
  , x5
  , x6
  , x7
  , x8
  , x9
  ] = X
  <$> mb pure x1
  <*> mb pure x2
  <*> mb pure x3
  <*> mb pure x4
  <*> mb pure x5
  <*> mb pure x6
  <*> mb pure x7
  <*> mb pure x8
  <*> mb pure x9

Unless mb is marked as NOINLINE, it gets expanded in the body of run a number of times that seems to grow exponentially with the size of the list (9 in the example).

comment:46 Changed 20 months ago by igloo

  • Milestone changed from 7.6.1 to 7.6.2

comment:47 Changed 19 months ago by aruiz

  • Cc aruiz@… added

Here is another small test case:

{-# LANGUAGE ForeignFunctionInterface #-}

module Test(test) where

import Foreign.C.Types
import Foreign.Ptr
import Foreign.Marshal

--------------------------------------------------------------------------------

foreign import ccall "c_fun" c_fun
    :: Ptr Double -> Ptr Double -> CInt
    -> Ptr Double -> Ptr Double -> CInt
    -> CInt
    -> Ptr (Ptr Double) -> Ptr (Ptr Double)
    -> Ptr (Ptr CInt) -> Ptr (CInt) -> Ptr (CInt)
    -> Ptr (CInt)
    -> Ptr (Ptr CInt)
    -> Ptr (Ptr CInt) -> Ptr (Ptr CInt)
    -> Ptr (Ptr CInt) -> Ptr (Ptr CInt)
    -> Ptr (Ptr Double) -> Ptr (Ptr Double)
    -> IO CInt

--------------------------------------------------------------------------------

test :: IO ()
test = do
    ppxs <- malloc
    ppys <- malloc
    ppos <- malloc
    ppl  <- malloc
    pn   <- malloc
    pnp  <- malloc
    pins <- malloc
    
    ppInd0A <- malloc
    ppInd1A <- malloc
    ppInd0B <- malloc
    ppInd1B <- malloc
    ppAlphaA <- malloc
    ppAlphaB <- malloc

    cx <- newArray [1..10]
    cy <- newArray [1..10]
    sx <- newArray [1..10]
    sy <- newArray [1..10]

    _ok <- c_fun cx cy 10 sx sy 10 1
                 ppxs ppys ppl pn pnp pins
                 ppos
                 ppInd0A ppInd1A ppInd0B ppInd1B
                 ppAlphaA ppAlphaB

    return ()

$ ghc -c bug.hs -O1
ghc: panic! (the 'impossible' happened)
  (GHC version 7.6.1 for i386-unknown-linux):
	Simplifier ticks exhausted
    When trying UnfoldingDone base:GHC.Storable.readDoubleOffPtr{v r1ch} [gid]
    To increase the limit, use -fsimpl-tick-factor=N (default 100)
    If you need to do this, let GHC HQ know, and what factor you needed
    To see detailed counts use -ddump-simpl-stats
    Total ticks: 15244

comment:48 Changed 18 months ago by ihameed

  • Cc idhameed@… added

comment:49 Changed 13 months ago by liyang

  • Cc hackage.haskell.org@… added

comment:50 Changed 8 months ago by ekmett

  • Cc ekmett@… added

comment:51 Changed 8 months ago by ghorn

Here is another small test case:

{-# OPTIONS_GHC -Wall #-}

module Main ( main ) where

import Foreign.Ptr ( Ptr )
import Foreign.Marshal ( malloc, mallocArray )

main :: IO ()
main = do
  _ <- malloc         :: IO (Ptr Int)
  _ <- malloc         :: IO (Ptr Int)
  _ <- malloc         :: IO (Ptr Int)
  _ <- malloc         :: IO (Ptr Int)
  _ <- malloc         :: IO (Ptr Int)
  _ <- malloc         :: IO (Ptr Int)
  _ <- mallocArray 10 :: IO (Ptr Int)
  _ <- mallocArray 10 :: IO (Ptr Int)
  _ <- mallocArray 10 :: IO (Ptr Int)
  return ()

This will fail as a bare-bones cabal project, but ghc --make will successfully compile it. I'm on GHC 7.6.3 on 64-bit Debian.

comment:52 Changed 8 months ago by ghorn

Ignore the comment above about cabal. Ghc will reproduce the bug with -O1

$ ghc -O1 Bug.hs
[1 of 1] Compiling Main             ( Bug.hs, Bug.o )
ghc: panic! (the 'impossible' happened)
  (GHC version 7.6.3 for x86_64-unknown-linux):
	Simplifier ticks exhausted
    When trying UnfoldingDone base:Foreign.Storable.$fStorableInt_$csizeOf{v r3z} [gid]
    To increase the limit, use -fsimpl-tick-factor=N (default 100)
    If you need to do this, let GHC HQ know, and what factor you needed
    To see detailed counts use -ddump-simpl-stats
    Total ticks: 7926

Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

comment:53 Changed 8 months ago by ghorn

sorry for the spam, I found a smaller example:

{-# OPTIONS_GHC -Wall #-}

module Main ( main ) where

import Foreign.Ptr ( Ptr )
import Foreign.Marshal ( malloc )

main :: IO ()
main = do
  _ <- malloc :: IO (Ptr Int)
  _ <- malloc :: IO (Ptr Int)
  _ <- malloc :: IO (Ptr Int)
  _ <- malloc :: IO (Ptr Int)
  _ <- malloc :: IO (Ptr Int)
  _ <- malloc :: IO (Ptr Int)
  _ <- malloc :: IO (Ptr Int)
  _ <- malloc :: IO (Ptr Int)
  _ <- malloc :: IO (Ptr Int)
  return ()
Last edited 8 months ago by hvr (previous) (diff)

comment:54 Changed 4 months ago by ghorn

  • Cc gregmainland@… added
Note: See TracTickets for help on using tickets.