Opened 2 years ago

Last modified 13 months ago

#5642 new bug

Deriving Generic of a big type takes a long time and lots of space

Reported by: basvandijk Owned by: dimitris
Priority: high Milestone: 7.6.2
Component: Compiler Version: 7.3
Keywords: Cc: v.dijk.bas@…, dimitris@…, hackage.haskell.org@…, carter.schonwald@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Difficulty: Unknown
Test Case: T5642 Blocked By:
Blocking: Related Tickets:

Description

If I load the following module into ghci my system will run out of memory after about 15 minutes:

{-# LANGUAGE DeriveGeneric #-}

import GHC.Generics

data BigSum = 
    C0   |  C1  | C2   | C3   | C4   | C5   | C6   | C7   | C8   | C9 
  | C10  | C11  | C12  | C13  | C14  | C15  | C16  | C17  | C18  | C19 
  | C20  | C21  | C22  | C23  | C24  | C25  | C26  | C27  | C28  | C29
  | C30  | C31  | C32  | C33  | C34  | C35  | C36  | C37  | C38  | C39
  | C40  | C41  | C42  | C43  | C44  | C45  | C46  | C47  | C48  | C49
  | C50  | C51  | C52  | C53  | C54  | C55  | C56  | C57  | C58  | C59
  | C60  | C61  | C62  | C63  | C64  | C65  | C66  | C67  | C68  | C69
  | C70  | C71  | C72  | C73  | C74  | C75  | C76  | C77  | C78  | C79
  | C80  | C81  | C82  | C83  | C84  | C85  | C86  | C87  | C88  | C89
  | C90  | C91  | C92  | C93  | C94  | C95  | C96  | C97  | C98  | C99
  | C100 | C101 | C102 | C103 | C104 | C105 | C106 | C107 | C108 | C109
  | C110 | C111 | C112 | C113 | C114 | C115 | C116 | C117 | C118 | C119
  | C120 | C121 | C122 | C123 | C124 | C125 | C126 | C127 | C128 | C129
  | C130 | C131 | C132 | C133 | C134 | C135 | C136 | C137 | C138 | C139
  | C140 | C141 | C142 | C143 | C144 | C145 | C146 | C147 | C148 | C149
  | C150 | C151 | C152 | C153 | C154 | C155 | C156 | C157 | C158 | C159
  | C160 | C161 | C162 | C163 | C164 | C165 | C166 | C167 | C168 | C169
  | C170 | C171 | C172 | C173 | C174 | C175 | C176 | C177 | C178 | C179
  | C180 | C181 | C182 | C183 | C184 | C185 | C186 | C187 | C188 | C189
  | C190 | C191 | C192 | C193 | C194 | C195 | C196 | C197 | C198 | C199
  | C200 | C201 | C202 | C203 | C204 | C205 | C206 | C207 | C208 | C209
  | C210 | C211 | C212 | C213 | C214 | C215 | C216 | C217 | C218 | C219
  | C220 | C221 | C222 | C223 | C224 | C225 | C226 | C227 | C228 | C229
  | C230 | C231 | C232 | C233 | C234 | C235 | C236 | C237 | C238 | C239
  | C240 | C241 | C242 | C243 | C244 | C245 | C246 | C247 | C248 | C249
  | C250 | C251 | C252 | C253 | C254 | C255 | C256 | C257 | C258 | C259
  | C260 | C261 | C262 | C263 | C264 | C265 | C266 | C267 | C268 | C269
  | C270 | C271 | C272 | C273 | C274 | C275 | C276 | C277 | C278 | C279
  | C280 | C281 | C282 | C283 | C284 | C285 | C286 | C287 | C288 | C289
  | C290 | C291 | C292 | C293 | C294 | C295 | C296 | C297 | C298 | C299
   deriving Generic

Big products have the same problem:

data BigProduct = C 
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    deriving Generic

Change History (17)

comment:1 Changed 2 years ago by basvandijk

  • Cc v.dijk.bas@… added

comment:2 follow-ups: Changed 2 years ago by clintm

Came to report this sort of thing too. I can't actually get this code to completely run as my machine runs out of memory (12g) almost immediately.

{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE UndecidableInstances #-}

module Main where

import GHC.Generics

data User = User { name :: String
                 , password :: String }
            deriving Generic

instance (Show User) => Show (User)

main :: IO ()
main = do
  let a = User "gurn" "secret"
  putStrLn $ show a
  return ()

comment:3 in reply to: ↑ 2 Changed 2 years ago by clintm

Replying to clintm:

Came to report this sort of thing too. I can't actually get this code to completely run ...

Forgot to mention, this is ghc 7.2.2

comment:4 Changed 2 years ago by dreixel

  • Cc dreixel removed
  • Owner set to dreixel

Thanks for reporting, I will look into it.

comment:5 in reply to: ↑ 2 Changed 2 years ago by basvandijk

Replying to clintm:

Came to report this sort of thing too. I can't actually get this code to completely run as my machine runs out of memory (12g) almost immediately.
...

This doesn't have anything to do with generics. You just forgot to give a definition for either show or showsPrec. Since these methods have default implementations which are defined in terms of each other you get an infinite loop which apparently overflows the stack.

Also note that the Show type class does not use DefaultSignatures and so does not have default generic implementations.

comment:6 Changed 2 years ago by dreixel

  • Owner changed from dreixel to dimitris
  • Version changed from 7.2.1 to 7.3

I thought the problem was the generation of Generic instances, but it isn't. If you write the instance by hand, it still takes long. So the bottleneck is the performance of the constraint solver TcSimplify. Things have improved with the new coercion solver; see the following figures for compiling derive Generic on a datatype with 100/200/300 constructors:

ghc-7.2.1
100     0m30.079s
200     3m44.419s
300     12m52.306s

HEAD
100     0m8.302s
200     1m1.631s
300     3m32.507s

But they could probably be much better, since the coercions introduced are not that complicated. I've notified Dimitrios, he'll look further into it.

comment:7 Changed 2 years ago by igloo

  • Milestone set to 7.4.1
  • Priority changed from normal to high

comment:8 follow-up: Changed 2 years ago by simonpj

  • Difficulty set to Unknown

I think this is going to be a difficult one to solve. The underlying problem is that the types grow non-linearly with the program size. Why? Look at Section 2.3 of Scrap your type applications.

This is a fundamental problem with System F, so it's not easy for GHC to get around it. It shows up especially with deeply-nested sums and products, which is exactly what is generated by the generic stuff. I'm not sure what to do here.

Simon

comment:9 Changed 2 years ago by dreixel

I had the impression Dimitrios had already made some progress on this, and we were just waiting to turn it into a perf test for the testsuite before closing this ticket. I think the resulting performance is not great, but it's good enough for most cases, including ASTs with 100 or so constructors.

comment:10 in reply to: ↑ 8 ; follow-up: Changed 2 years ago by basvandijk

Replying to simonpj:

I think this is going to be a difficult one to solve. The underlying problem is that the types grow non-linearly with the program size. Why? Look at Section 2.3 of Scrap your type applications.

Got it: N-ary data constructors like (C e1 e2 e3 e4 e5) are translated to nested pairs (Pair : ∀a,b. a → b → (a, b)). This causes a quadratic blow-up in size:

Pair        σ1 (σ2,(σ3,(σ4,σ5))) e1
  (Pair     σ2     (σ3,(σ4,σ5))  e2
    (Pair   σ3         (σ4,σ5)   e3 
      (Pair σ4             σ5    e4 
                                 e5)))

This is a fundamental problem with System F, so it's not easy for GHC to get around it. It shows up especially with deeply-nested sums and products, which is exactly what is generated by the generic stuff. I'm not sure what to do here.

So if I understand the paper correctly System IF would solve this by removing the redundant type applications using:

Pair ψ τ s t → Pair ψ s τ t by (ξ2)
             → Pair   s τ t by (ξ1)
             → Pair   s   t by (ξ1)

Of course the question is: is it worth implementing System IF for only solving this problem? (I understand System IF did not significantly improve compile times for the base library).

comment:11 in reply to: ↑ 10 Changed 2 years ago by dreixel

Replying to basvandijk:

Got it: N-ary data constructors like (C e1 e2 e3 e4 e5) are translated to nested pairs (Pair : ∀a,b. a → b → (a, b)). This causes a quadratic blow-up in size:

Pair        σ1 (σ2,(σ3,(σ4,σ5))) e1
  (Pair     σ2     (σ3,(σ4,σ5))  e2
    (Pair   σ3         (σ4,σ5)   e3 
      (Pair σ4             σ5    e4 
                                 e5)))

No; we balance the sums and the products, so it grows with logarithmic complexity. See the second column of the 4th page of the original paper.

comment:12 Changed 2 years ago by igloo

  • Test Case set to T5642

comment:13 Changed 2 years ago by igloo

  • Milestone changed from 7.4.1 to 7.4.2

comment:14 Changed 2 years ago by simonpj

  • Cc dimitris@… added
  • Milestone changed from 7.4.2 to 7.6.1

There is still something odd here. Here's the output for HEAD, doing ghc T5642.hs -dshow-passes +RTS -s, for various sizes of input data type. If you do it yourself you'll see that 99% of the time is spent in the typechecker. The rest of compilation is fast.

So not only is it non-linear but the constant factor is terrible. It just can't take that long to typecheck this stuff!

I'm punting for 7.4, but we need to look again for 7.6.

Simon


100 constructors

simonpj@cam-05-unx:~/tmp$ ~/5builds/HEAD/inplace/bin/ghc-stage2 -c T5642.hs -dshow-passes +RTS -s
Glasgow Haskell Compiler, Version 7.5, stage 2 booted by GHC version 7.4.1
*** Checking old interface for main:T5642:
*** Parser:
*** Renamer/typechecker:
*** Desugar:
Result size of Desugar (after optimization)
  = {terms: 3,498, types: 160,413, coercions: 1,305}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 3,298, types: 110,113, coercions: 51,906}
Result size of Simplifier
  = {terms: 3,298, types: 110,113, coercions: 51,805}
*** Tidy Core:
Result size of Tidy Core
  = {terms: 3,298, types: 110,113, coercions: 51,805}
*** CorePrep:
Result size of CorePrep
  = {terms: 4,642, types: 159,413, coercions: 51,805}
*** Stg2Stg:
*** CodeGen:
*** CodeOutput:
*** Assembler:
*** Deleting temp files:
Warning: deleting non-existent /tmp/ghc46808_0/ghc46808_0.c
*** Deleting temp dirs:
   3,189,810,752 bytes allocated in the heap
   1,176,054,736 bytes copied during GC
      70,837,720 bytes maximum residency (19 sample(s))
       1,225,552 bytes maximum slop
             158 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      6039 colls,     0 par    1.99s    1.99s     0.0003s    0.0038s
  Gen  1        19 colls,     0 par    1.45s    1.45s     0.0761s    0.2469s

  TASKS: 3 (1 bound, 2 peak workers (2 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    5.86s  (  5.95s elapsed)
  GC      time    3.44s  (  3.43s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    9.29s  (  9.38s elapsed)

  Alloc rate    544,723,338 bytes per MUT second

  Productivity  63.0% of total user, 62.4% of total elapsed

150 constructors

simonpj@cam-05-unx:~/tmp$ ~/5builds/HEAD/inplace/bin/ghc-stage2 -c T5642.hs -dshow-passes +RTS -s
Glasgow Haskell Compiler, Version 7.5, stage 2 booted by GHC version 7.4.1
*** Checking old interface for main:T5642:
*** Parser:
*** Renamer/typechecker:
*** Desugar:
Result size of Desugar (after optimization)
  = {terms: 5,690, types: 401,601, coercions: 2,85}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 5,370, types: 273,121, coercions: 131,46}
Result size of Simplifier
  = {terms: 5,370, types: 273,121, coercions: 130,885}
*** Tidy Core:
Result size of Tidy Core
  = {terms: 5,370, types: 273,121, coercions: 130,885}
*** CorePrep:
Result size of CorePrep
  = {terms: 7,738, types: 400,1, coercions: 130,885}
*** Stg2Stg:
*** CodeGen:
*** CodeOutput:
*** Assembler:
*** Deleting temp files:
Warning: deleting non-existent /tmp/ghc47478_0/ghc47478_0.c
*** Deleting temp dirs:
   8,336,602,152 bytes allocated in the heap
   4,066,619,088 bytes copied during GC
     218,323,536 bytes maximum residency (33 sample(s))
       3,689,368 bytes maximum slop
             479 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     15893 colls,     0 par    6.45s    6.45s     0.0004s    0.0033s
  Gen  1        33 colls,     0 par    4.78s    4.78s     0.1448s    0.7436s

  TASKS: 3 (1 bound, 2 peak workers (2 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   24.45s  ( 24.60s elapsed)
  GC      time   11.23s  ( 11.23s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time   35.68s  ( 35.82s elapsed)

  Alloc rate    340,990,013 bytes per MUT second

  Productivity  68.5% of total user, 68.3% of total elapsed

200 constructors

*** Desugar:
Result size of Desugar (after optimization)
  = {terms: 7,170, types: 622,561, coercions: 2,605}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 6,770, types: 421,961, coercions: 203,806}
Result size of Simplifier
  = {terms: 6,770, types: 421,961, coercions: 203,605}
*** Tidy Core:
Result size of Tidy Core
  = {terms: 6,770, types: 421,961, coercions: 203,605}
*** CorePrep:
Result size of CorePrep
  = {terms: 9,858, types: 620,561, coercions: 203,605}
*** Stg2Stg:
*** CodeGen:
*** CodeOutput:
*** Assembler:
*** Deleting temp files:
Warning: deleting non-existent /tmp/ghc47501_0/ghc47501_0.c
*** Deleting temp dirs:
  13,262,925,720 bytes allocated in the heap
   7,286,910,712 bytes copied during GC
     298,657,992 bytes maximum residency (45 sample(s))
       5,548,736 bytes maximum slop
             640 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     25322 colls,     0 par   11.98s   11.97s     0.0005s    0.0034s
  Gen  1        45 colls,     0 par    8.58s    8.59s     0.1908s    0.9460s

  TASKS: 3 (1 bound, 2 peak workers (2 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   50.07s  ( 50.22s elapsed)
  GC      time   20.56s  ( 20.55s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time   70.63s  ( 70.78s elapsed)

  Alloc rate    264,903,443 bytes per MUT second

  Productivity  70.9% of total user, 70.7% of total elapsed

comment:15 Changed 19 months ago by igloo

  • Milestone changed from 7.6.1 to 7.6.2

comment:16 Changed 13 months ago by liyang

  • Cc hackage.haskell.org@… added

comment:17 Changed 13 months ago by carter

  • Cc carter.schonwald@… added
Note: See TracTickets for help on using tickets.