Opened 5 years ago

Last modified 3 weeks ago

#5642 new bug

Deriving Generic of a big type takes a long time and lots of space

Reported by: basvandijk Owned by: bgamari
Priority: normal Milestone:
Component: Compiler Version: 7.3
Keywords: deriving-perf, Generics Cc: v.dijk.bas@…, dimitris@…, hackage.haskell.org@…, carter.schonwald@…, gregmainland@…, vhaisman@…, gidyn, RyanGlScott
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case: T5642
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

If I load the following module into ghci my system will run out of memory after about 15 minutes:

{-# LANGUAGE DeriveGeneric #-}

import GHC.Generics

data BigSum = 
    C0   |  C1  | C2   | C3   | C4   | C5   | C6   | C7   | C8   | C9 
  | C10  | C11  | C12  | C13  | C14  | C15  | C16  | C17  | C18  | C19 
  | C20  | C21  | C22  | C23  | C24  | C25  | C26  | C27  | C28  | C29
  | C30  | C31  | C32  | C33  | C34  | C35  | C36  | C37  | C38  | C39
  | C40  | C41  | C42  | C43  | C44  | C45  | C46  | C47  | C48  | C49
  | C50  | C51  | C52  | C53  | C54  | C55  | C56  | C57  | C58  | C59
  | C60  | C61  | C62  | C63  | C64  | C65  | C66  | C67  | C68  | C69
  | C70  | C71  | C72  | C73  | C74  | C75  | C76  | C77  | C78  | C79
  | C80  | C81  | C82  | C83  | C84  | C85  | C86  | C87  | C88  | C89
  | C90  | C91  | C92  | C93  | C94  | C95  | C96  | C97  | C98  | C99
  | C100 | C101 | C102 | C103 | C104 | C105 | C106 | C107 | C108 | C109
  | C110 | C111 | C112 | C113 | C114 | C115 | C116 | C117 | C118 | C119
  | C120 | C121 | C122 | C123 | C124 | C125 | C126 | C127 | C128 | C129
  | C130 | C131 | C132 | C133 | C134 | C135 | C136 | C137 | C138 | C139
  | C140 | C141 | C142 | C143 | C144 | C145 | C146 | C147 | C148 | C149
  | C150 | C151 | C152 | C153 | C154 | C155 | C156 | C157 | C158 | C159
  | C160 | C161 | C162 | C163 | C164 | C165 | C166 | C167 | C168 | C169
  | C170 | C171 | C172 | C173 | C174 | C175 | C176 | C177 | C178 | C179
  | C180 | C181 | C182 | C183 | C184 | C185 | C186 | C187 | C188 | C189
  | C190 | C191 | C192 | C193 | C194 | C195 | C196 | C197 | C198 | C199
  | C200 | C201 | C202 | C203 | C204 | C205 | C206 | C207 | C208 | C209
  | C210 | C211 | C212 | C213 | C214 | C215 | C216 | C217 | C218 | C219
  | C220 | C221 | C222 | C223 | C224 | C225 | C226 | C227 | C228 | C229
  | C230 | C231 | C232 | C233 | C234 | C235 | C236 | C237 | C238 | C239
  | C240 | C241 | C242 | C243 | C244 | C245 | C246 | C247 | C248 | C249
  | C250 | C251 | C252 | C253 | C254 | C255 | C256 | C257 | C258 | C259
  | C260 | C261 | C262 | C263 | C264 | C265 | C266 | C267 | C268 | C269
  | C270 | C271 | C272 | C273 | C274 | C275 | C276 | C277 | C278 | C279
  | C280 | C281 | C282 | C283 | C284 | C285 | C286 | C287 | C288 | C289
  | C290 | C291 | C292 | C293 | C294 | C295 | C296 | C297 | C298 | C299
   deriving Generic

Big products have the same problem:

data BigProduct = C 
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    deriving Generic

Change History (36)

comment:1 Changed 5 years ago by basvandijk

  • Cc v.dijk.bas@… added

comment:2 follow-ups: Changed 5 years ago by clintm

Came to report this sort of thing too. I can't actually get this code to completely run as my machine runs out of memory (12g) almost immediately.

{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE UndecidableInstances #-}

module Main where

import GHC.Generics

data User = User { name :: String
                 , password :: String }
            deriving Generic

instance (Show User) => Show (User)

main :: IO ()
main = do
  let a = User "gurn" "secret"
  putStrLn $ show a
  return ()

comment:3 in reply to: ↑ 2 Changed 5 years ago by clintm

Replying to clintm:

Came to report this sort of thing too. I can't actually get this code to completely run ...

Forgot to mention, this is ghc 7.2.2

comment:4 Changed 5 years ago by dreixel

  • Cc dreixel removed
  • Owner set to dreixel

Thanks for reporting, I will look into it.

comment:5 in reply to: ↑ 2 Changed 5 years ago by basvandijk

Replying to clintm:

Came to report this sort of thing too. I can't actually get this code to completely run as my machine runs out of memory (12g) almost immediately. ...

This doesn't have anything to do with generics. You just forgot to give a definition for either show or showsPrec. Since these methods have default implementations which are defined in terms of each other you get an infinite loop which apparently overflows the stack.

Also note that the Show type class does not use DefaultSignatures and so does not have default generic implementations.

comment:6 Changed 5 years ago by dreixel

  • Owner changed from dreixel to dimitris
  • Version changed from 7.2.1 to 7.3

I thought the problem was the generation of Generic instances, but it isn't. If you write the instance by hand, it still takes long. So the bottleneck is the performance of the constraint solver TcSimplify. Things have improved with the new coercion solver; see the following figures for compiling derive Generic on a datatype with 100/200/300 constructors:

ghc-7.2.1
100     0m30.079s
200     3m44.419s
300     12m52.306s

HEAD
100     0m8.302s
200     1m1.631s
300     3m32.507s

But they could probably be much better, since the coercions introduced are not that complicated. I've notified Dimitrios, he'll look further into it.

comment:7 Changed 5 years ago by igloo

  • Milestone set to 7.4.1
  • Priority changed from normal to high

comment:8 follow-up: Changed 4 years ago by simonpj

  • difficulty set to Unknown

I think this is going to be a difficult one to solve. The underlying problem is that the types grow non-linearly with the program size. Why? Look at Section 2.3 of Scrap your type applications.

This is a fundamental problem with System F, so it's not easy for GHC to get around it. It shows up especially with deeply-nested sums and products, which is exactly what is generated by the generic stuff. I'm not sure what to do here.

Simon

comment:9 Changed 4 years ago by dreixel

I had the impression Dimitrios had already made some progress on this, and we were just waiting to turn it into a perf test for the testsuite before closing this ticket. I think the resulting performance is not great, but it's good enough for most cases, including ASTs with 100 or so constructors.

comment:10 in reply to: ↑ 8 ; follow-up: Changed 4 years ago by basvandijk

Replying to simonpj:

I think this is going to be a difficult one to solve. The underlying problem is that the types grow non-linearly with the program size. Why? Look at Section 2.3 of Scrap your type applications.

Got it: N-ary data constructors like (C e1 e2 e3 e4 e5) are translated to nested pairs (Pair : ∀a,b. a → b → (a, b)). This causes a quadratic blow-up in size:

Pair        σ1 (σ2,(σ3,(σ4,σ5))) e1
  (Pair     σ2     (σ3,(σ4,σ5))  e2
    (Pair   σ3         (σ4,σ5)   e3 
      (Pair σ4             σ5    e4 
                                 e5)))

This is a fundamental problem with System F, so it's not easy for GHC to get around it. It shows up especially with deeply-nested sums and products, which is exactly what is generated by the generic stuff. I'm not sure what to do here.

So if I understand the paper correctly System IF would solve this by removing the redundant type applications using:

Pair ψ τ s t → Pair ψ s τ t by (ξ2)
             → Pair   s τ t by (ξ1)
             → Pair   s   t by (ξ1)

Of course the question is: is it worth implementing System IF for only solving this problem? (I understand System IF did not significantly improve compile times for the base library).

comment:11 in reply to: ↑ 10 Changed 4 years ago by dreixel

Replying to basvandijk:

Got it: N-ary data constructors like (C e1 e2 e3 e4 e5) are translated to nested pairs (Pair : ∀a,b. a → b → (a, b)). This causes a quadratic blow-up in size:

Pair        σ1 (σ2,(σ3,(σ4,σ5))) e1
  (Pair     σ2     (σ3,(σ4,σ5))  e2
    (Pair   σ3         (σ4,σ5)   e3 
      (Pair σ4             σ5    e4 
                                 e5)))

No; we balance the sums and the products, so it grows with logarithmic complexity. See the second column of the 4th page of the original paper.

comment:12 Changed 4 years ago by igloo

  • Test Case set to T5642

comment:13 Changed 4 years ago by igloo

  • Milestone changed from 7.4.1 to 7.4.2

comment:14 Changed 4 years ago by simonpj

  • Cc dimitris@… added
  • Milestone changed from 7.4.2 to 7.6.1

There is still something odd here. Here's the output for HEAD, doing ghc T5642.hs -dshow-passes +RTS -s, for various sizes of input data type. If you do it yourself you'll see that 99% of the time is spent in the typechecker. The rest of compilation is fast.

So not only is it non-linear but the constant factor is terrible. It just can't take that long to typecheck this stuff!

I'm punting for 7.4, but we need to look again for 7.6.

Simon


100 constructors

simonpj@cam-05-unx:~/tmp$ ~/5builds/HEAD/inplace/bin/ghc-stage2 -c T5642.hs -dshow-passes +RTS -s
Glasgow Haskell Compiler, Version 7.5, stage 2 booted by GHC version 7.4.1
*** Checking old interface for main:T5642:
*** Parser:
*** Renamer/typechecker:
*** Desugar:
Result size of Desugar (after optimization)
  = {terms: 3,498, types: 160,413, coercions: 1,305}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 3,298, types: 110,113, coercions: 51,906}
Result size of Simplifier
  = {terms: 3,298, types: 110,113, coercions: 51,805}
*** Tidy Core:
Result size of Tidy Core
  = {terms: 3,298, types: 110,113, coercions: 51,805}
*** CorePrep:
Result size of CorePrep
  = {terms: 4,642, types: 159,413, coercions: 51,805}
*** Stg2Stg:
*** CodeGen:
*** CodeOutput:
*** Assembler:
*** Deleting temp files:
Warning: deleting non-existent /tmp/ghc46808_0/ghc46808_0.c
*** Deleting temp dirs:
   3,189,810,752 bytes allocated in the heap
   1,176,054,736 bytes copied during GC
      70,837,720 bytes maximum residency (19 sample(s))
       1,225,552 bytes maximum slop
             158 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      6039 colls,     0 par    1.99s    1.99s     0.0003s    0.0038s
  Gen  1        19 colls,     0 par    1.45s    1.45s     0.0761s    0.2469s

  TASKS: 3 (1 bound, 2 peak workers (2 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    5.86s  (  5.95s elapsed)
  GC      time    3.44s  (  3.43s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    9.29s  (  9.38s elapsed)

  Alloc rate    544,723,338 bytes per MUT second

  Productivity  63.0% of total user, 62.4% of total elapsed

150 constructors

simonpj@cam-05-unx:~/tmp$ ~/5builds/HEAD/inplace/bin/ghc-stage2 -c T5642.hs -dshow-passes +RTS -s
Glasgow Haskell Compiler, Version 7.5, stage 2 booted by GHC version 7.4.1
*** Checking old interface for main:T5642:
*** Parser:
*** Renamer/typechecker:
*** Desugar:
Result size of Desugar (after optimization)
  = {terms: 5,690, types: 401,601, coercions: 2,85}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 5,370, types: 273,121, coercions: 131,46}
Result size of Simplifier
  = {terms: 5,370, types: 273,121, coercions: 130,885}
*** Tidy Core:
Result size of Tidy Core
  = {terms: 5,370, types: 273,121, coercions: 130,885}
*** CorePrep:
Result size of CorePrep
  = {terms: 7,738, types: 400,1, coercions: 130,885}
*** Stg2Stg:
*** CodeGen:
*** CodeOutput:
*** Assembler:
*** Deleting temp files:
Warning: deleting non-existent /tmp/ghc47478_0/ghc47478_0.c
*** Deleting temp dirs:
   8,336,602,152 bytes allocated in the heap
   4,066,619,088 bytes copied during GC
     218,323,536 bytes maximum residency (33 sample(s))
       3,689,368 bytes maximum slop
             479 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     15893 colls,     0 par    6.45s    6.45s     0.0004s    0.0033s
  Gen  1        33 colls,     0 par    4.78s    4.78s     0.1448s    0.7436s

  TASKS: 3 (1 bound, 2 peak workers (2 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   24.45s  ( 24.60s elapsed)
  GC      time   11.23s  ( 11.23s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time   35.68s  ( 35.82s elapsed)

  Alloc rate    340,990,013 bytes per MUT second

  Productivity  68.5% of total user, 68.3% of total elapsed

200 constructors

*** Desugar:
Result size of Desugar (after optimization)
  = {terms: 7,170, types: 622,561, coercions: 2,605}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 6,770, types: 421,961, coercions: 203,806}
Result size of Simplifier
  = {terms: 6,770, types: 421,961, coercions: 203,605}
*** Tidy Core:
Result size of Tidy Core
  = {terms: 6,770, types: 421,961, coercions: 203,605}
*** CorePrep:
Result size of CorePrep
  = {terms: 9,858, types: 620,561, coercions: 203,605}
*** Stg2Stg:
*** CodeGen:
*** CodeOutput:
*** Assembler:
*** Deleting temp files:
Warning: deleting non-existent /tmp/ghc47501_0/ghc47501_0.c
*** Deleting temp dirs:
  13,262,925,720 bytes allocated in the heap
   7,286,910,712 bytes copied during GC
     298,657,992 bytes maximum residency (45 sample(s))
       5,548,736 bytes maximum slop
             640 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     25322 colls,     0 par   11.98s   11.97s     0.0005s    0.0034s
  Gen  1        45 colls,     0 par    8.58s    8.59s     0.1908s    0.9460s

  TASKS: 3 (1 bound, 2 peak workers (2 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   50.07s  ( 50.22s elapsed)
  GC      time   20.56s  ( 20.55s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time   70.63s  ( 70.78s elapsed)

  Alloc rate    264,903,443 bytes per MUT second

  Productivity  70.9% of total user, 70.7% of total elapsed

comment:15 Changed 4 years ago by igloo

  • Milestone changed from 7.6.1 to 7.6.2

comment:16 Changed 3 years ago by liyang

  • Cc hackage.haskell.org@… added

comment:17 Changed 3 years ago by carter

  • Cc carter.schonwald@… added

comment:18 Changed 23 months ago by thoughtpolice

  • Priority changed from high to normal

Lowering priority (these tickets are assigned to older versions, so they're getting bumped as they've been around for a while).

comment:19 Changed 23 months ago by thoughtpolice

  • Milestone changed from 7.6.2 to 7.10.1

Moving to 7.10.1.

comment:20 Changed 17 months ago by thoughtpolice

  • Milestone changed from 7.10.1 to 7.12.1

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:21 Changed 9 months ago by thoughtpolice

  • Milestone changed from 7.12.1 to 8.0.1

Milestone renamed

comment:22 Changed 8 months ago by basvandijk

Just a heads-ups that a user of aeson just ran into this.

comment:23 Changed 8 months ago by bgamari

  • Owner changed from dimitris to bgamari

I'll be looking into this.

comment:24 Changed 8 months ago by bgamari

For the record this sounds quite similar to #8095 in that the compiler seems to be creating long strings of coercions (although here the simplifier doesn't even seem capable of eliminating them).

comment:25 Changed 7 months ago by basvandijk

Another user of aeson also ran into this (or something which looks like this). However, instead of a big sum type he has a big product type.

The other interesting thing is that the issue only seems to appear with aeson-0.10.0.0.

comment:26 Changed 7 months ago by ghorn

  • Cc gregmainland@… added

comment:27 Changed 5 months ago by ezyang

  • Keywords deriving-perf added

comment:28 Changed 5 months ago by wilx

  • Cc vhaisman@… added

comment:29 Changed 5 months ago by gidyn

  • Cc gidyn added

comment:30 Changed 4 months ago by thomie

  • Milestone 8.0.1 deleted

comment:31 Changed 4 months ago by thomie

  • Keywords Generics added

comment:32 Changed 4 months ago by Ben Gamari <ben@…>

In 16cf460/ghc:

testsuite: Un-break T5642

This was largely fixed by the re-rework of the pattern match checker.
Resolves #5642.

comment:33 Changed 4 weeks ago by nh2

I just filed a duplicate of this (#11991) with another easy-to-run repro that demonstrates that it's precisely O(n²) for sum types: https://github.com/nh2/ghc-generics-deriving-is-slow/

comment:34 Changed 4 weeks ago by bgamari

The profile compiling Data400.hs (admittedly with a very old profiled compiler that I had laying around) from the repo given in comment:33 looks like this,

        Wed Apr 27 23:26 2016 Time and Allocation Profiling Report  (Final)

           ghc +RTS -hc -p -RTS -B/opt/exp/ghc/roots/profiled/lib/ghc-8.1.20160222 Data400.hs -fforce-recomp

        total time  =        5.12 secs   (5118 ticks @ 1000 us, 1 processor)
        total alloc = 7,708,824,224 bytes  (excludes profiling overheads)

COST CENTRE     MODULE       %time %alloc

canEvVar        TcCanonical   41.0   45.8
zonkTopDecls    TcRnDriver    17.4   25.3
CorePrep        HscMain        6.4    7.5
SimplTopBinds   SimplCore      5.9    2.6
CoreTidy        HscMain        5.5    5.6
Simplify        SimplCore      4.2    0.0
deSugar         HscMain        3.6    1.7
tc_rn_src_decls TcRnDriver     3.2    2.9
solve_loop      TcInteract     2.5    0.7
OccAnal         SimplCore      1.6    0.2
pprNativeCode   AsmCodeGen     1.1    1.3
Last edited 4 weeks ago by bgamari (previous) (diff)

comment:35 Changed 4 weeks ago by simonpj

Do read comment:8. I think there is something fundamentally difficult here.

comment:36 Changed 3 weeks ago by RyanGlScott

  • Cc RyanGlScott added
Note: See TracTickets for help on using tickets.