Opened 5 years ago

Last modified 4 weeks ago

#5642 merge bug

Deriving Generic of a big type takes a long time and lots of space

Reported by: basvandijk Owned by: bgamari
Priority: normal Milestone:
Component: Compiler Version: 7.3
Keywords: deriving-perf, Generics Cc: v.dijk.bas@…, dimitris@…, hackage.haskell.org@…, carter.schonwald@…, gregmainland@…, vhaisman@…, gidyn, RyanGlScott, akst, bgamari
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time performance bug Test Case: T5642
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D2304
Wiki Page:

Description

If I load the following module into ghci my system will run out of memory after about 15 minutes:

{-# LANGUAGE DeriveGeneric #-}

import GHC.Generics

data BigSum = 
    C0   |  C1  | C2   | C3   | C4   | C5   | C6   | C7   | C8   | C9 
  | C10  | C11  | C12  | C13  | C14  | C15  | C16  | C17  | C18  | C19 
  | C20  | C21  | C22  | C23  | C24  | C25  | C26  | C27  | C28  | C29
  | C30  | C31  | C32  | C33  | C34  | C35  | C36  | C37  | C38  | C39
  | C40  | C41  | C42  | C43  | C44  | C45  | C46  | C47  | C48  | C49
  | C50  | C51  | C52  | C53  | C54  | C55  | C56  | C57  | C58  | C59
  | C60  | C61  | C62  | C63  | C64  | C65  | C66  | C67  | C68  | C69
  | C70  | C71  | C72  | C73  | C74  | C75  | C76  | C77  | C78  | C79
  | C80  | C81  | C82  | C83  | C84  | C85  | C86  | C87  | C88  | C89
  | C90  | C91  | C92  | C93  | C94  | C95  | C96  | C97  | C98  | C99
  | C100 | C101 | C102 | C103 | C104 | C105 | C106 | C107 | C108 | C109
  | C110 | C111 | C112 | C113 | C114 | C115 | C116 | C117 | C118 | C119
  | C120 | C121 | C122 | C123 | C124 | C125 | C126 | C127 | C128 | C129
  | C130 | C131 | C132 | C133 | C134 | C135 | C136 | C137 | C138 | C139
  | C140 | C141 | C142 | C143 | C144 | C145 | C146 | C147 | C148 | C149
  | C150 | C151 | C152 | C153 | C154 | C155 | C156 | C157 | C158 | C159
  | C160 | C161 | C162 | C163 | C164 | C165 | C166 | C167 | C168 | C169
  | C170 | C171 | C172 | C173 | C174 | C175 | C176 | C177 | C178 | C179
  | C180 | C181 | C182 | C183 | C184 | C185 | C186 | C187 | C188 | C189
  | C190 | C191 | C192 | C193 | C194 | C195 | C196 | C197 | C198 | C199
  | C200 | C201 | C202 | C203 | C204 | C205 | C206 | C207 | C208 | C209
  | C210 | C211 | C212 | C213 | C214 | C215 | C216 | C217 | C218 | C219
  | C220 | C221 | C222 | C223 | C224 | C225 | C226 | C227 | C228 | C229
  | C230 | C231 | C232 | C233 | C234 | C235 | C236 | C237 | C238 | C239
  | C240 | C241 | C242 | C243 | C244 | C245 | C246 | C247 | C248 | C249
  | C250 | C251 | C252 | C253 | C254 | C255 | C256 | C257 | C258 | C259
  | C260 | C261 | C262 | C263 | C264 | C265 | C266 | C267 | C268 | C269
  | C270 | C271 | C272 | C273 | C274 | C275 | C276 | C277 | C278 | C279
  | C280 | C281 | C282 | C283 | C284 | C285 | C286 | C287 | C288 | C289
  | C290 | C291 | C292 | C293 | C294 | C295 | C296 | C297 | C298 | C299
   deriving Generic

Big products have the same problem:

data BigProduct = C 
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    () () () () () () () () () ()
    deriving Generic

Change History (46)

comment:1 Changed 5 years ago by basvandijk

  • Cc v.dijk.bas@… added

comment:2 follow-ups: Changed 5 years ago by clintm

Came to report this sort of thing too. I can't actually get this code to completely run as my machine runs out of memory (12g) almost immediately.

{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE UndecidableInstances #-}

module Main where

import GHC.Generics

data User = User { name :: String
                 , password :: String }
            deriving Generic

instance (Show User) => Show (User)

main :: IO ()
main = do
  let a = User "gurn" "secret"
  putStrLn $ show a
  return ()

comment:3 in reply to: ↑ 2 Changed 5 years ago by clintm

Replying to clintm:

Came to report this sort of thing too. I can't actually get this code to completely run ...

Forgot to mention, this is ghc 7.2.2

comment:4 Changed 5 years ago by dreixel

  • Cc dreixel removed
  • Owner set to dreixel

Thanks for reporting, I will look into it.

comment:5 in reply to: ↑ 2 Changed 5 years ago by basvandijk

Replying to clintm:

Came to report this sort of thing too. I can't actually get this code to completely run as my machine runs out of memory (12g) almost immediately. ...

This doesn't have anything to do with generics. You just forgot to give a definition for either show or showsPrec. Since these methods have default implementations which are defined in terms of each other you get an infinite loop which apparently overflows the stack.

Also note that the Show type class does not use DefaultSignatures and so does not have default generic implementations.

comment:6 Changed 5 years ago by dreixel

  • Owner changed from dreixel to dimitris
  • Version changed from 7.2.1 to 7.3

I thought the problem was the generation of Generic instances, but it isn't. If you write the instance by hand, it still takes long. So the bottleneck is the performance of the constraint solver TcSimplify. Things have improved with the new coercion solver; see the following figures for compiling derive Generic on a datatype with 100/200/300 constructors:

ghc-7.2.1
100     0m30.079s
200     3m44.419s
300     12m52.306s

HEAD
100     0m8.302s
200     1m1.631s
300     3m32.507s

But they could probably be much better, since the coercions introduced are not that complicated. I've notified Dimitrios, he'll look further into it.

comment:7 Changed 5 years ago by igloo

  • Milestone set to 7.4.1
  • Priority changed from normal to high

comment:8 follow-up: Changed 5 years ago by simonpj

  • difficulty set to Unknown

I think this is going to be a difficult one to solve. The underlying problem is that the types grow non-linearly with the program size. Why? Look at Section 2.3 of Scrap your type applications.

This is a fundamental problem with System F, so it's not easy for GHC to get around it. It shows up especially with deeply-nested sums and products, which is exactly what is generated by the generic stuff. I'm not sure what to do here.

Simon

comment:9 Changed 5 years ago by dreixel

I had the impression Dimitrios had already made some progress on this, and we were just waiting to turn it into a perf test for the testsuite before closing this ticket. I think the resulting performance is not great, but it's good enough for most cases, including ASTs with 100 or so constructors.

comment:10 in reply to: ↑ 8 ; follow-up: Changed 5 years ago by basvandijk

Replying to simonpj:

I think this is going to be a difficult one to solve. The underlying problem is that the types grow non-linearly with the program size. Why? Look at Section 2.3 of Scrap your type applications.

Got it: N-ary data constructors like (C e1 e2 e3 e4 e5) are translated to nested pairs (Pair : ∀a,b. a → b → (a, b)). This causes a quadratic blow-up in size:

Pair        σ1 (σ2,(σ3,(σ4,σ5))) e1
  (Pair     σ2     (σ3,(σ4,σ5))  e2
    (Pair   σ3         (σ4,σ5)   e3 
      (Pair σ4             σ5    e4 
                                 e5)))

This is a fundamental problem with System F, so it's not easy for GHC to get around it. It shows up especially with deeply-nested sums and products, which is exactly what is generated by the generic stuff. I'm not sure what to do here.

So if I understand the paper correctly System IF would solve this by removing the redundant type applications using:

Pair ψ τ s t → Pair ψ s τ t by (ξ2)
             → Pair   s τ t by (ξ1)
             → Pair   s   t by (ξ1)

Of course the question is: is it worth implementing System IF for only solving this problem? (I understand System IF did not significantly improve compile times for the base library).

comment:11 in reply to: ↑ 10 Changed 5 years ago by dreixel

Replying to basvandijk:

Got it: N-ary data constructors like (C e1 e2 e3 e4 e5) are translated to nested pairs (Pair : ∀a,b. a → b → (a, b)). This causes a quadratic blow-up in size:

Pair        σ1 (σ2,(σ3,(σ4,σ5))) e1
  (Pair     σ2     (σ3,(σ4,σ5))  e2
    (Pair   σ3         (σ4,σ5)   e3 
      (Pair σ4             σ5    e4 
                                 e5)))

No; we balance the sums and the products, so it grows with logarithmic complexity. See the second column of the 4th page of the original paper.

comment:12 Changed 4 years ago by igloo

  • Test Case set to T5642

comment:13 Changed 4 years ago by igloo

  • Milestone changed from 7.4.1 to 7.4.2

comment:14 Changed 4 years ago by simonpj

  • Cc dimitris@… added
  • Milestone changed from 7.4.2 to 7.6.1

There is still something odd here. Here's the output for HEAD, doing ghc T5642.hs -dshow-passes +RTS -s, for various sizes of input data type. If you do it yourself you'll see that 99% of the time is spent in the typechecker. The rest of compilation is fast.

So not only is it non-linear but the constant factor is terrible. It just can't take that long to typecheck this stuff!

I'm punting for 7.4, but we need to look again for 7.6.

Simon


100 constructors

simonpj@cam-05-unx:~/tmp$ ~/5builds/HEAD/inplace/bin/ghc-stage2 -c T5642.hs -dshow-passes +RTS -s
Glasgow Haskell Compiler, Version 7.5, stage 2 booted by GHC version 7.4.1
*** Checking old interface for main:T5642:
*** Parser:
*** Renamer/typechecker:
*** Desugar:
Result size of Desugar (after optimization)
  = {terms: 3,498, types: 160,413, coercions: 1,305}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 3,298, types: 110,113, coercions: 51,906}
Result size of Simplifier
  = {terms: 3,298, types: 110,113, coercions: 51,805}
*** Tidy Core:
Result size of Tidy Core
  = {terms: 3,298, types: 110,113, coercions: 51,805}
*** CorePrep:
Result size of CorePrep
  = {terms: 4,642, types: 159,413, coercions: 51,805}
*** Stg2Stg:
*** CodeGen:
*** CodeOutput:
*** Assembler:
*** Deleting temp files:
Warning: deleting non-existent /tmp/ghc46808_0/ghc46808_0.c
*** Deleting temp dirs:
   3,189,810,752 bytes allocated in the heap
   1,176,054,736 bytes copied during GC
      70,837,720 bytes maximum residency (19 sample(s))
       1,225,552 bytes maximum slop
             158 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      6039 colls,     0 par    1.99s    1.99s     0.0003s    0.0038s
  Gen  1        19 colls,     0 par    1.45s    1.45s     0.0761s    0.2469s

  TASKS: 3 (1 bound, 2 peak workers (2 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    5.86s  (  5.95s elapsed)
  GC      time    3.44s  (  3.43s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    9.29s  (  9.38s elapsed)

  Alloc rate    544,723,338 bytes per MUT second

  Productivity  63.0% of total user, 62.4% of total elapsed

150 constructors

simonpj@cam-05-unx:~/tmp$ ~/5builds/HEAD/inplace/bin/ghc-stage2 -c T5642.hs -dshow-passes +RTS -s
Glasgow Haskell Compiler, Version 7.5, stage 2 booted by GHC version 7.4.1
*** Checking old interface for main:T5642:
*** Parser:
*** Renamer/typechecker:
*** Desugar:
Result size of Desugar (after optimization)
  = {terms: 5,690, types: 401,601, coercions: 2,85}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 5,370, types: 273,121, coercions: 131,46}
Result size of Simplifier
  = {terms: 5,370, types: 273,121, coercions: 130,885}
*** Tidy Core:
Result size of Tidy Core
  = {terms: 5,370, types: 273,121, coercions: 130,885}
*** CorePrep:
Result size of CorePrep
  = {terms: 7,738, types: 400,1, coercions: 130,885}
*** Stg2Stg:
*** CodeGen:
*** CodeOutput:
*** Assembler:
*** Deleting temp files:
Warning: deleting non-existent /tmp/ghc47478_0/ghc47478_0.c
*** Deleting temp dirs:
   8,336,602,152 bytes allocated in the heap
   4,066,619,088 bytes copied during GC
     218,323,536 bytes maximum residency (33 sample(s))
       3,689,368 bytes maximum slop
             479 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     15893 colls,     0 par    6.45s    6.45s     0.0004s    0.0033s
  Gen  1        33 colls,     0 par    4.78s    4.78s     0.1448s    0.7436s

  TASKS: 3 (1 bound, 2 peak workers (2 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   24.45s  ( 24.60s elapsed)
  GC      time   11.23s  ( 11.23s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time   35.68s  ( 35.82s elapsed)

  Alloc rate    340,990,013 bytes per MUT second

  Productivity  68.5% of total user, 68.3% of total elapsed

200 constructors

*** Desugar:
Result size of Desugar (after optimization)
  = {terms: 7,170, types: 622,561, coercions: 2,605}
*** Simplifier:
Result size of Simplifier iteration=1
  = {terms: 6,770, types: 421,961, coercions: 203,806}
Result size of Simplifier
  = {terms: 6,770, types: 421,961, coercions: 203,605}
*** Tidy Core:
Result size of Tidy Core
  = {terms: 6,770, types: 421,961, coercions: 203,605}
*** CorePrep:
Result size of CorePrep
  = {terms: 9,858, types: 620,561, coercions: 203,605}
*** Stg2Stg:
*** CodeGen:
*** CodeOutput:
*** Assembler:
*** Deleting temp files:
Warning: deleting non-existent /tmp/ghc47501_0/ghc47501_0.c
*** Deleting temp dirs:
  13,262,925,720 bytes allocated in the heap
   7,286,910,712 bytes copied during GC
     298,657,992 bytes maximum residency (45 sample(s))
       5,548,736 bytes maximum slop
             640 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     25322 colls,     0 par   11.98s   11.97s     0.0005s    0.0034s
  Gen  1        45 colls,     0 par    8.58s    8.59s     0.1908s    0.9460s

  TASKS: 3 (1 bound, 2 peak workers (2 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   50.07s  ( 50.22s elapsed)
  GC      time   20.56s  ( 20.55s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time   70.63s  ( 70.78s elapsed)

  Alloc rate    264,903,443 bytes per MUT second

  Productivity  70.9% of total user, 70.7% of total elapsed

comment:15 Changed 4 years ago by igloo

  • Milestone changed from 7.6.1 to 7.6.2

comment:16 Changed 3 years ago by liyang

  • Cc hackage.haskell.org@… added

comment:17 Changed 3 years ago by carter

  • Cc carter.schonwald@… added

comment:18 Changed 2 years ago by thoughtpolice

  • Priority changed from high to normal

Lowering priority (these tickets are assigned to older versions, so they're getting bumped as they've been around for a while).

comment:19 Changed 2 years ago by thoughtpolice

  • Milestone changed from 7.6.2 to 7.10.1

Moving to 7.10.1.

comment:20 Changed 19 months ago by thoughtpolice

  • Milestone changed from 7.10.1 to 7.12.1

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:21 Changed 11 months ago by thoughtpolice

  • Milestone changed from 7.12.1 to 8.0.1

Milestone renamed

comment:22 Changed 10 months ago by basvandijk

Just a heads-ups that a user of aeson just ran into this.

comment:23 Changed 10 months ago by bgamari

  • Owner changed from dimitris to bgamari

I'll be looking into this.

comment:24 Changed 10 months ago by bgamari

For the record this sounds quite similar to #8095 in that the compiler seems to be creating long strings of coercions (although here the simplifier doesn't even seem capable of eliminating them).

comment:25 Changed 9 months ago by basvandijk

Another user of aeson also ran into this (or something which looks like this). However, instead of a big sum type he has a big product type.

The other interesting thing is that the issue only seems to appear with aeson-0.10.0.0.

comment:26 Changed 9 months ago by ghorn

  • Cc gregmainland@… added

comment:27 Changed 7 months ago by ezyang

  • Keywords deriving-perf added

comment:28 Changed 6 months ago by wilx

  • Cc vhaisman@… added

comment:29 Changed 6 months ago by gidyn

  • Cc gidyn added

comment:30 Changed 6 months ago by thomie

  • Milestone 8.0.1 deleted

comment:31 Changed 6 months ago by thomie

  • Keywords Generics added

comment:32 Changed 6 months ago by Ben Gamari <ben@…>

In 16cf460/ghc:

testsuite: Un-break T5642

This was largely fixed by the re-rework of the pattern match checker.
Resolves #5642.

comment:33 Changed 3 months ago by nh2

I just filed a duplicate of this (#11991) with another easy-to-run repro that demonstrates that it's precisely O(n²) for sum types: https://github.com/nh2/ghc-generics-deriving-is-slow/

comment:34 Changed 3 months ago by bgamari

The profile compiling Data400.hs (admittedly with a very old profiled compiler that I had laying around) from the repo given in comment:33 looks like this,

        Wed Apr 27 23:26 2016 Time and Allocation Profiling Report  (Final)

           ghc +RTS -hc -p -RTS -B/opt/exp/ghc/roots/profiled/lib/ghc-8.1.20160222 Data400.hs -fforce-recomp

        total time  =        5.12 secs   (5118 ticks @ 1000 us, 1 processor)
        total alloc = 7,708,824,224 bytes  (excludes profiling overheads)

COST CENTRE     MODULE       %time %alloc

canEvVar        TcCanonical   41.0   45.8
zonkTopDecls    TcRnDriver    17.4   25.3
CorePrep        HscMain        6.4    7.5
SimplTopBinds   SimplCore      5.9    2.6
CoreTidy        HscMain        5.5    5.6
Simplify        SimplCore      4.2    0.0
deSugar         HscMain        3.6    1.7
tc_rn_src_decls TcRnDriver     3.2    2.9
solve_loop      TcInteract     2.5    0.7
OccAnal         SimplCore      1.6    0.2
pprNativeCode   AsmCodeGen     1.1    1.3
Last edited 3 months ago by bgamari (previous) (diff)

comment:35 Changed 3 months ago by simonpj

Do read comment:8. I think there is something fundamentally difficult here.

comment:36 Changed 3 months ago by RyanGlScott

  • Cc RyanGlScott added

comment:37 Changed 7 weeks ago by akst

  • Cc akst added

comment:38 Changed 7 weeks ago by RyanGlScott

  • Differential Rev(s) set to Phab:D2304

I've found a little trick to help reduce the constant factors in deriving Generic. (See Phab:D2304 for more details.) One can factor out the topmost M1 from the derived definitions of to and from to reduce the number of times GHC has to typecheck an application of M1. This in turn leads to a reduction in the number of coercions, which lowers allocation.

I tested this trick against basvandijk's 300-constructor datatype, and it saves almost 3 seconds of compilation time and uses ~6.4 times fewer coercions.

comment:39 Changed 6 weeks ago by RyanGlScott

Per SPJ's request, I've reposted some sleuthing I did in the comments of Phab:D2304.

I decided to my changes on the 300-constructor sum type mentioned at the top of this ticket. I manually implemented a Generic instance for this datatype three times, and put each one in its own file:

  • Gen_v1.hs, which contains a Generic instance as GHC derives it currently (without the changes in Phab:D2304)
  • Gen_v2.hs, which is like Gen_v1.hs except that it factors out the topmost M1 in from/to (i.e., with the changes in Phab:D2304)
  • Gen_v3.hs, which is like Gen_v1.hs except that it both (1) factors out the topmost M1 in from/to and (2) factors out common occurrences of L1/R1 in to

I compiled each file with ghc -O2 -v3 +RTS -s and dumped the results to logs. Here are the highlights:

  • Gen_v1.txt
    • {terms: 16,282, types: 2,563,921, coercions: 639,012}
    • 7,781,716,432 bytes allocated in the heap
    • Total time 8.708s ( 8.719s elapsed)
  • Gen_v2.txt
    • {terms: 16,288, types: 2,924,492, coercions: 9,950}
    • 4,479,400,544 bytes allocated in the heap
    • Total time 5.580s ( 5.590s elapsed)
  • Gen_v3.txt
    • {terms: 16,288, types: 2,924,492, coercions: 9,950}
    • 4,016,934,848 bytes allocated in the heap
    • Total time 4.990s ( 5.006s elapsed)

There is a huge difference between v1 and v2, as suspected. There is a difference between v2 and v3 in that it allocated fewer bytes on the heap, but interestingly, v3 has the exact same number of types and coercions, so I'm not sure where the improvement is coming from.

comment:40 Changed 6 weeks ago by bgamari

  • Cc bgamari added

comment:41 Changed 5 weeks ago by Ben Gamari <ben@…>

In 9649fc0a/ghc:

Refactor derived Generic instances to reduce allocations

Previously, derived implementations of `to`/`from` in `Generic`
instances were wastefully putting extra `M1`s in every case, which led
to an O(n) increase in the number of coercions, resulting in a slowdown
during the typechecker phase.

This factors out the common `M1` in every case of a `to`/`from`
definition so that the typechecker has far fewer coercions to deal with.
For a datatype with 300 constructors, this change has been observed to
save almost 3 seconds of compilation time.

This is one step towards coming up with a solution for #5642.

Test Plan: ./validate

Reviewers: hvr, austin, simonpj, bgamari

Reviewed By: bgamari

Subscribers: basvandijk, carter, thomie, osa1

Differential Revision: https://phabricator.haskell.org/D2304

GHC Trac Issues: #5642

comment:42 Changed 5 weeks ago by bgamari

  • Status changed from new to merge

We might consider merging this for 8.0.2.

comment:43 follow-up: Changed 5 weeks ago by gidyn

Gen_v3 not going in?

comment:44 in reply to: ↑ 43 Changed 5 weeks ago by RyanGlScott

Replying to gidyn:

Gen_v3 not going in?

At least, not for now. In Phab:D2304, SPJ also asked to repeat the same experiment in comment:39, but with a four-constructor datatype:

data Bigsum = C0 | C1 | C2 | C3 -- Perhaps not the best name in hindsight...

As above, here are Gen_v1.hs, Gen_v2.hs, and Gen_v3.hs, and here are their logs:

  • Gen_v1.txt
    • {terms: 143, types: 562, coercions: 239}
    • 94,739,056 bytes allocated in the heap
    • Total time 0.201s ( 0.191s elapsed)
  • Gen_v2.txt
    • {terms: 148, types: 638, coercions: 174}
    • 93,979,880 bytes allocated in the heap
    • Total time 0.196s ( 0.187s elapsed)
  • Gen_v3.txt
    • {terms: 148, types: 638, coercions: 174}
    • 94,281,768 bytes allocated in the heap
    • Total time 0.203s ( 0.192s elapsed)

Curiously, in this example, v3 fares worse than v2. Again, I'm not sure what's going on here, and with a sample size of two, it's difficult to conclude if the strategy in v3 is always better than v2. I suspect it's asymptotically better, since we go from an O(n) number of L1s/R1s to an O(log n) number, but I'd need more evidence to support that claim.

comment:45 Changed 4 weeks ago by rrnewton

Small addendum. If you split this up and compile the "from" and "to" methods separately, it is clear that most of the work is in the "from". On the small-four constructor examples:

Version 1:

  • 239 coercions with both to/from.
  • 200 coercions with just "from"
  • 39 coercions with just "to"

Version 2:

  • 174 coercions with both to/from.
  • 132 with just from
  • 42 with just "to"

And on the full-sized, 300-way sums, here are the realtime measurements on my personal machine:

Version 1:

  • both: 639K coercions, 7.0s
  • from-only: 636.9K coercions, 5.7s
  • to-only: 2K coercions, 1.8s

Version 2:

  • both: 9.9K coercions, 4.4s
  • from-only: 7.5K coercions, 3.6s
  • to-only: 2.4K coercions, 1.0s

At least in the bigger case, it looks like there is a drop in realtime for both of the from/to methods, with the former being constructor-heavy, and the latter being deep-pattern-match-heavy.

Slicing it another way, if we cut down the cases in the 300-way match for "from-only"/v1 -- leaving incomplete pattern matches -- the coercions/time drops as follows:

  • 300 cases -- 636.9K coercions, 5.7s
  • 150 cases -- 316K coercions, 4.0s
  • 100 cases -- 212K coercions, 2.5s
  • 50 cases -- 106K coercions, 1.3s
  • 25 cases -- 53K coercions, 0.8s

And "from-only"/v2:

  • 300 cases -- 7.5K coercions, 3.7s
  • 150 cases -- 5.8K coercions, 2.8s
  • 50 cases -- 4.8K coercions, 1.1s
  • 25 cases -- 4.5K coercions, 0.7s

Ok, so nothing that unusual in this dimension. As should be expected, the "factored" v2 gets a larger benefit over v1 the more cases the factored bit is amortized over.

comment:46 Changed 4 weeks ago by rrnewton

One more small experiment. Before diving into the compiler internals, I thought it might be possible to do standard test case minimization. I.e. can we create some stub definitions for the relevant bits of GHC.Generics, while preserving the bad behavior?

The file in this gist attempts to do so, using the following stub definitions:

class Generic a where
     from :: a -> M1 a

data M1 a where
  M1 :: b -> M1 a

data U1 p = U1
data (:+:) (f :: * -> *) (g :: * -> *) p = L1 (f p) | R1 (g p)

This is a lower bound -- it fails to produce the really slow behavior. It is simplified enough that the time for compiling just the "from" case drops from 5.7s all the way down to 0.85s. Getting back to the bad behavior will require putting more of the complexity of "M1" back into play...

Note: See TracTickets for help on using tickets.