Opened 2 years ago

Closed 19 months ago

Last modified 19 months ago

#10852 closed bug (duplicate)

ghc 7.8.4 on arm - panic: Simplifier ticks exhausted

Reported by: andrewufrank Owned by:
Priority: high Milestone:
Component: Compiler Version: 7.10.2
Keywords: Generics Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case:
Blocked By: Blocking:
Related Tickets: #5642, #9675 Differential Rev(s):
Wiki Page:

Description (last modified by bgamari)

i tried to compile (cabal install) chatter-0.5.2.0 on a armbian (debian jessie, with ghc 7.10.2-1 from testing, running on a cubietruck ARMHF Cortex A20) and get in file NLP.Corpora.Conll a panic:

[ 9 of 23] Compiling NLP.Corpora.Conll ( src/NLP/Corpora/Conll.hs, dist/build/NLP/Corpora/Conll.o )
ghc: panic! (the 'impossible' happened)
  (GHC version 7.10.2 for arm-unknown-linux):
	Simplifier ticks exhausted
  When trying UnfoldingDone $fGSerialize:*:_$s<$>
  To increase the limit, use -fsimpl-tick-factor=N (default 100)
  If you need to do this, let GHC HQ know, and what factor you needed
  To see detailed counts use -ddump-simpl-stats
  Total ticks: 2748218

i do not notice anything particular in the code, except a very long data type with enumerated values. I tried with a higher value for tick-factor (1000) with no improvement.

i will now try to run 7.10.2-2 from experimental.

Attachments (2)

InstanceSerialize.hs (4.3 KB) - added by thomie 2 years ago.
InstanceSerialize.log (6.5 KB) - added by thomie 2 years ago.

Download all attachments as: .zip

Change History (16)

comment:1 Changed 2 years ago by andrewufrank

with 7.10.2-2 from debian experimental i have the same result. the code works on 7.10.1 on amd64, thus it appears to be a problem with the armhf compiler. thank you for advancing the arm ghc compiler.

Changed 2 years ago by thomie

Attachment: InstanceSerialize.hs added

Changed 2 years ago by thomie

Attachment: InstanceSerialize.log added

comment:2 Changed 2 years ago by thomie

I can't reproduce that panic on x86-64 Linux with ghc-7.10.2, but do notice GHC needs an awful lot of memory to compile chatter. My laptop starts swapping.

I extracted an example. Run cabal install cereal-0.4.1.0 first (cereal doesn't have dependencies itself, and doesn't take long to build).

$ ghc-7.10.2 -fforce-recomp InstanceSerialize.hs -dshow-passes -O +RTS -s

 105,749,837,048 bytes allocated in the heap
  34,168,091,184 bytes copied during GC
   1,094,766,416 bytes maximum residency (31 sample(s))
      13,390,536 bytes maximum slop
            2797 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      2770 colls,     0 par   109.295s  117.115s     0.0423s    2.0924s
  Gen  1        31 colls,     0 par   124.908s  198.248s     6.3951s    25.1279s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.001s  (  0.001s elapsed)
  MUT     time  284.506s  (326.497s elapsed)
  GC      time  234.203s  (315.363s elapsed)
  EXIT    time    0.001s  (  0.001s elapsed)
  Total   time  518.747s  (641.862s elapsed)

  Alloc rate    371,696,208 bytes per MUT second

  Productivity  54.9% of total user, 44.3% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0

The full log is attached.

I notice the number of terms increases by a factor of 8 after the first simplifier pass. I don't know if that is normal.

comment:3 Changed 2 years ago by thomie

Keywords: arm added

comment:4 Changed 2 years ago by thomie

Architecture: Unknown/Multiplearm

comment:5 Changed 2 years ago by erikd

Cc: erikd added

comment:6 Changed 2 years ago by andrewufrank

a simpler example - pureMD5 does give the same result (again on an armhf with ghc 7.10.2.20151030

Glasgow Haskell Compiler, Version 7.10.2.20151030, stage 2 booted by GHC version 7.10.2.20151030
Using binary package database: /usr/lib/ghc/package.conf.d/package.cache
Using binary package database: /home/frank/.ghc/arm-linux-7.10.2.20151030/package.conf.d/package.cache
wired-in package ghc-prim mapped to ghc-prim-0.4.0.0-087c70fa92d0fbd8056dbc5853433ed1
wired-in package integer-gmp mapped to integer-gmp-1.0.0.0-c6adc6639382c14317a5926d860e24d8
wired-in package base mapped to base-4.8.2.0-6d4788b4fac4bef0277f64a0a9ece0ab
wired-in package rts mapped to builtin_rts
wired-in package template-haskell mapped to template-haskell-2.10.0.0-96406e1c046e7d3794950273552e30d5
wired-in package ghc mapped to ghc-7.10.2.20151030-28f5d1fe7d4210612a8dc6732b57dced
wired-in package dph-seq not found.
wired-in package dph-par not found.
Hsc static flags: 
*** Deleting temp files:
Deleting: 
*** Deleting temp dirs:
Deleting: 
ghc: no input files

comment:7 Changed 2 years ago by andrewufrank

the problem in pureMD5 seems to be in the part

#ifdef FastWordExtract
getNthWord n b = inlinePerformIO (unsafeUseAsCString b (flip peekElemOff n . castPtr))
#else
getNthWord :: Int -> B.ByteString -> Word32
getNthWord n = right . G.runGet G.getWord32le . B.drop (n * sizeOf (undefined :: Word32))
  where
  right x = case x of Right y -> y
#endif
-- {-# INLINE getNthWord #-}

i have removed the inline pragma and it compiles. the armhf is not FastWordExtract (at least not set in the cabal) and thus uses code which is typically for the intel processor not used.

i hope this helps to fix the problem. thank you!

comment:8 Changed 2 years ago by bgamari

Description: modified (diff)

comment:9 Changed 2 years ago by bgamari

I would say that this really isn't a GHC so much as it is a pureMD5 issue. While it's pretty reasonable to request that GHC inline the unsafeUseAsCString form of getNthWord, the cereal version produces far too much code to be worth inlining. Despite this, the library forces GHC to inline in both cases, hence the explosion during simplification. I have opened a pull request fixing this upstream.

I haven't yet looked at the chatter issue.

comment:10 Changed 2 years ago by bgamari

While I have no problem compiling Conll, the Brown module is indeed quite problematic, even on my laptop. This module produces an absolutely absurd amount of Core which appears to originate from the generic Serialize instances for the quite large Tag type.

The attached testcase reproduces the blow-up. While the code size starts large-but-not insane,

Result size of Desugar (after optimization)
  = {terms: 25,522, types: 972,788, coercions: 376,020}

Things quickly balloon during float-out,

*** Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = False}):
Result size of Float out(FOS {Lam = Just 0,
                              Consts = True,
                              OverSatApps = False})
  = {terms: 254,872, types: 3,162,281, coercions: 634,615}

which the simplifier, through a great deal of effort, manages to reduce down to,

Result size of CorePrep
  = {terms: 49,840, types: 1,752,631, coercions: 442,468}
Last edited 2 years ago by bgamari (previous) (diff)

comment:11 Changed 23 months ago by rwbarton

Architecture: armUnknown/Multiple
Cc: erikd removed
Keywords: Generics added; arm removed
Operating System: LinuxUnknown/Multiple
Type of failure: Compile-time crashCompile-time performance bug

bgamari: you seem to have neglected to attach anything?

But I'm going to guess the issue is similar to #11415. Not an issue with GHC's deriving Generic, directly, but with a Generic-based default definition for another class (here Serialize).

comment:12 Changed 23 months ago by bgamari

rwbarton, the testcase is already attached (attachment:InstanceSerialize.hs).

Your assessment sounds about right.

comment:13 Changed 19 months ago by thomie

Priority: normalhigh

I made two changes to my test setup, to compile InstanceSerialize.hs:

This includes a pull request that @thoughpolice made recently, to split up the GSerialize class in two. (same trick as ticket:9630#comment:23). I'd hoped this would reduce memory consumption and compile time.

  • use ghc-8.0.1 instead of ghc-7.10.3.

Result:

Simplifier ticks exhausted
  When trying UnfoldingDone unGet

-fsimpl-tick-factor=400 made it go through (and 300 did not), but it still uses 2GB RAM and takes 4-5 minutes to complete.

Edit: I can no longer reproduce this with cereal-0.5.2.0. InstanceSerialize.hs now compiles in ~1 minute.

Last edited 19 months ago by thomie (previous) (diff)

comment:14 Changed 19 months ago by thomie

Resolution: duplicate
Status: newclosed

Moving to #12148, since this issue is not ARM specific, and this ticket is getting confusing. I'll copy the relevant parts from the comments above.

Note: See TracTickets for help on using tickets.