I have this program that depends on the library "red-black-record" version 2.0.2.2 on Hackage:
{-# LANGUAGE DataKinds, TypeApplications #-}moduleMainwhereimportData.RBR(FromList,Delete,Variant,I,injectI,winnowI,match)importGHC.TypeLitstypePhase01=FromList'['("ctor1",Int),'("ctor2",Bool),'("ctor4",Char),'("ctor3",Char),'("ctor6",Char),'("ctor5",Char),'("ctor10",Char),'("ctor11",Char),'("ctor13",Char),'("ctor14",Char),'("ctor39",Char),'("ctor46",Char),'("ctor47",Char),'("ctor44",Char),'("ctor43",Char),'("ctor7",Char),'("ctor9",Char),'("ctor20",Char),'("ctor45",Char),'("ctor21",Char),'("ctor48",Char),'("ctor49",Char),'("ctor50",Char),'("ctor41",Char),'("ctor33",Char),'("ctor32",Char),'("ctor42",Char),'("ctor22",Char),'("ctor23",Char),'("ctor8",Char),'("ctor40",Char),'("ctor29",Char),'("ctor24",Char),'("ctor38",Char),'("ctor25",Char),'("ctor26",Char),'("ctor27",Char),'("ctor28",Char),'("ctor36",Char),'("ctor52",Char),'("ctor51",Char),'("ctor53",Char),'("ctor12",Char),'("ctor54",Char),'("ctor15",Char),'("ctor31",Char),'("ctor30",Char),'("ctor34",Char),'("ctor35",Char),'("ctor17",Char),'("ctor16",Char),'("ctor18",Char),'("ctor19",Char),'("ctor37",Char)]typePhase02=Delete"ctor1"IntPhase01main::IO()main=print(match@"ctor17"(fromPhase1ToPhase2(injectI@"ctor1"2)))wherefromPhase1ToPhase2::VariantIPhase01->VariantIPhase02fromPhase1ToPhase2v=casewinnowI@"ctor1"@IntvofRightz->injectI@"ctor2"FalseLeftl->l
"red-black-record" provides extensible variants; the code is basically removing a branch from a variant with 50-plus branches, and then trying to match another branch. It is type family-heavy code.
The code as it is takes **~9 seconds** to compile on my machine. But when I move the fromPhase1ToPhase2 function to the top level (including the signature) compilation time balloons to **~ 29 seconds**. Is there a reason it should be so?
As another datapoint, moving the function to the top level but omitting the complex type-level map parameters (Phase01, Phase02) using partial type signatures (also requires a new type application) compiles in **~9 seconds** again.
{-# LANGUAGE PartialTypeSignatures #-}{-# OPTIONS_GHC -Wno-partial-type-signatures #-}-- ...typePhase02=Delete"ctor1"IntPhase01fromPhase1ToPhase2::VariantI_->VariantI_fromPhase1ToPhase2v=casewinnowI@"ctor1"@Int@Phase01vofRightz->injectI@"ctor2"FalseLeftl->lmain::IO()main=print(match@"ctor17"(fromPhase1ToPhase2(injectI@"ctor1"2)))
Trac metadata
Trac field
Value
Version
8.4.2
Type
Bug
TypeOfFailure
OtherFailure
Priority
normal
Resolution
Unresolved
Component
Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
Edited
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
A self-contained example with no dependencies, that includes the relevant portions of the library. The code for reproducing the bug is at the end of the file.
I've been experimenting with type family performance changes using this test case. @trac-danidiaz if you haven't already, you might like to try compiling with -fno-opt-coercion...
I conjecture that at least in type-family heavy programs, the effort of coercion optimization may actually make things worse! Has there been any work on assessing how worthwhile coercion optimization is in general?
(EDIT: in case anyone tries to reproduce, these results are with fromPhase1ToPhase2 lifted to the top level. The effect is still noticeable when it is a local declaration, but the numbers are lower.)
Build without -fno-opt-coercion
$ cabal exec -w ghc-9.0.1 -- ghc-9.0.1 T16382.hs -fforce-recomp -dshow-passes +RTS -s...[1 of 1] Compiling Main ( T16382.hs, T16382.o )*** Parser [Main]:!!! Parser [Main]: finished in 1.12 milliseconds, allocated 1.940 megabytes*** Renamer/typechecker [Main]:!!! Renamer/typechecker [Main]: finished in 767.45 milliseconds, allocated 747.194 megabytes*** Desugar [Main]:Result size of Desugar (before optimization) = {terms: 370, types: 36,895, coercions: 2,680,810, joins: 0/1}Result size of Desugar (after optimization) = {terms: 274, types: 30,344, coercions: 2,614,350, joins: 0/0}!!! Desugar [Main]: finished in 902.76 milliseconds, allocated 1398.125 megabytes*** Simplifier [Main]:Result size of Simplifier iteration=1 = {terms: 290, types: 32,749, coercions: 2,616,977, joins: 0/0}Result size of Simplifier = {terms: 290, types: 32,749, coercions: 2,616,362, joins: 0/0}!!! Simplifier [Main]: finished in 1566.70 milliseconds, allocated 1974.953 megabytes*** CoreTidy [Main]:Result size of Tidy Core = {terms: 290, types: 32,749, coercions: 2,616,362, joins: 0/0}!!! CoreTidy [Main]: finished in 568.62 milliseconds, allocated 327.197 megabytesCreated temporary directory: /tmp/ghc6573_0*** CorePrep [Main]:Result size of CorePrep = {terms: 307, types: 35,742, coercions: 2,616,362, joins: 0/6}!!! CorePrep [Main]: finished in 75.50 milliseconds, allocated 1.359 megabytes*** Stg2Stg:*** CodeGen [Main]:!!! CodeGen [Main]: finished in 16.89 milliseconds, allocated 24.906 megabytes*** systool:as:*** Assembler:!!! systool:as: finished in 0.48 milliseconds, allocated 0.097 megabytesUpsweep completely successful.*** Deleting temp files:Warning: deleting non-existent /tmp/ghc6573_0/ghc_1.sWarning: deleting non-existent /tmp/ghc6573_0/ghc_3.cLinking T16382 ...*** systool:cc:*** C Compiler:!!! systool:cc: finished in 0.56 milliseconds, allocated 0.130 megabytes*** systool:cc:*** C Compiler:!!! systool:cc: finished in 0.76 milliseconds, allocated 0.121 megabytes*** systool:linker:*** Linker:!!! systool:linker: finished in 2.91 milliseconds, allocated 2.691 megabytes*** Deleting temp files:*** Deleting temp dirs: 4,759,323,640 bytes allocated in the heap 1,059,209,912 bytes copied during GC 170,661,696 bytes maximum residency (12 sample(s)) 571,584 bytes maximum slop 387 MiB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 457 colls, 0 par 0.706s 0.706s 0.0015s 0.0539s Gen 1 12 colls, 0 par 0.669s 0.669s 0.0558s 0.1827s TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1) SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled) INIT time 0.001s ( 0.001s elapsed) MUT time 2.653s ( 3.068s elapsed) GC time 1.375s ( 1.375s elapsed) EXIT time 0.001s ( 0.007s elapsed) Total time 4.029s ( 4.450s elapsed) Alloc rate 1,793,923,599 bytes per MUT second Productivity 65.8% of total user, 68.9% of total elapsed
Build with -fno-opt-coercion
$ cabal exec -w ghc-9.0.1 -- ghc-9.0.1 T16382.hs -fforce-recomp -dshow-passes -fno-opt-coercion +RTS -s...[1 of 1] Compiling Main ( T16382.hs, T16382.o )*** Parser [Main]:!!! Parser [Main]: finished in 0.98 milliseconds, allocated 1.940 megabytes*** Renamer/typechecker [Main]:!!! Renamer/typechecker [Main]: finished in 790.88 milliseconds, allocated 747.194 megabytes*** Desugar [Main]:Result size of Desugar (before optimization) = {terms: 370, types: 36,895, coercions: 2,680,810, joins: 0/1}Result size of Desugar (after optimization) = {terms: 274, types: 30,344, coercions: 2,680,810, joins: 0/0}!!! Desugar [Main]: finished in 222.14 milliseconds, allocated 31.450 megabytes*** Simplifier [Main]:Result size of Simplifier iteration=1 = {terms: 290, types: 32,749, coercions: 2,725,962, joins: 0/0}Result size of Simplifier = {terms: 290, types: 32,749, coercions: 2,725,962, joins: 0/0}!!! Simplifier [Main]: finished in 384.82 milliseconds, allocated 3.234 megabytes*** CoreTidy [Main]:Result size of Tidy Core = {terms: 290, types: 32,749, coercions: 2,725,962, joins: 0/0}!!! CoreTidy [Main]: finished in 664.60 milliseconds, allocated 337.935 megabytesCreated temporary directory: /tmp/ghc6328_0*** CorePrep [Main]:Result size of CorePrep = {terms: 307, types: 35,742, coercions: 2,725,962, joins: 0/6}!!! CorePrep [Main]: finished in 85.75 milliseconds, allocated 1.359 megabytes*** Stg2Stg:*** CodeGen [Main]:!!! CodeGen [Main]: finished in 15.49 milliseconds, allocated 24.906 megabytes*** systool:as:*** Assembler:!!! systool:as: finished in 0.34 milliseconds, allocated 0.097 megabytesUpsweep completely successful.*** Deleting temp files:Warning: deleting non-existent /tmp/ghc6328_0/ghc_1.sWarning: deleting non-existent /tmp/ghc6328_0/ghc_3.cLinking T16382 ...*** systool:cc:*** C Compiler:!!! systool:cc: finished in 0.52 milliseconds, allocated 0.130 megabytes*** systool:cc:*** C Compiler:!!! systool:cc: finished in 0.52 milliseconds, allocated 0.121 megabytes*** systool:linker:*** Linker:!!! systool:linker: finished in 2.69 milliseconds, allocated 2.691 megabytes*** Deleting temp files:*** Deleting temp dirs: 1,270,048,032 bytes allocated in the heap 968,854,640 bytes copied during GC 192,694,488 bytes maximum residency (12 sample(s)) 612,136 bytes maximum slop 376 MiB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 185 colls, 0 par 0.515s 0.515s 0.0028s 0.0323s Gen 1 12 colls, 0 par 0.619s 0.619s 0.0516s 0.1921s TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1) SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled) INIT time 0.001s ( 0.001s elapsed) MUT time 1.126s ( 1.560s elapsed) GC time 1.134s ( 1.135s elapsed) EXIT time 0.001s ( 0.005s elapsed) Total time 2.262s ( 2.700s elapsed) Alloc rate 1,128,237,559 bytes per MUT second Productivity 49.8% of total user, 57.8% of total elapsed