#14364 closed task (fixed)

Reduce repetition in derived Read instances

Reported by: bgamari Owned by:
Priority: normal Milestone: 8.4.1
Component: Compiler Version: 8.2.1
Keywords: deriving Cc: tdammers
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: #10980 #7258 Differential Rev(s):
Wiki Page:

Description

While looking at #7258 with tdammers, I noticed that 5000 of the 7000 terms that the W2 module simplifies to are attributed to $creadPrec. In fact, much of this is repetition of the form,

                  (GHC.Read.expectP
                     (Text.Read.Lex.Ident (GHC.CString.unpackCString# "field1"#)))
                  (>>
                     @ Text.ParserCombinators.ReadPrec.ReadPrec
                     Text.ParserCombinators.ReadPrec.$fMonadReadPrec
                     @ ()
                     @ DT
                     (GHC.Read.expectP
                        (Text.Read.Lex.Punc (GHC.CString.unpackCString# "="#)))
                     (>>=
                        @ Text.ParserCombinators.ReadPrec.ReadPrec
                        Text.ParserCombinators.ReadPrec.$fMonadReadPrec
                        @ Int
                        @ DT
                        (Text.ParserCombinators.ReadPrec.reset
                           @ Int (GHC.Read.readPrec @ Int GHC.Read.$fReadInt))
                        (\ (a1_a1oe :: Int) ->
                           >>
                             @ Text.ParserCombinators.ReadPrec.ReadPrec
                             Text.ParserCombinators.ReadPrec.$fMonadReadPrec
                             @ ()
                             @ DT
                             (GHC.Read.expectP
                                (Text.Read.Lex.Punc (GHC.CString.unpackCString# ","#)))

Let's factor this pattern out into a readField helper in GHC.Read,

readField :: String -> ReadPrec a -> ReadPrec a
readField fieldName readVal = do
    expectP  (Ident fieldName)
    expectP (Punc "=")
    readVal
{-# NOINLINE readField #-}

This will at least knock off a constant factor from the size of what should not be performance-critical code.

We could also try folding the terminal "," into this, although then we would need to somehow handle the last field specially. This might not be worth it. Perhaps instead just factor out the comma case as well,

readComma :: ReadPrec ()
readComma = expectP (Punc ",")
{-# NOINLINE readField #-}

It's unclear whether this is worth it, however.

Change History (9)

comment:1 Changed 16 months ago by RyanGlScott

Keywords: deriving added

comment:2 Changed 16 months ago by simonpj

Sounds very plausible to me. And you could have two variants of readField, one with a comma and one not.

comment:3 Changed 16 months ago by tdammers

The problem is not just repetition, it's also very deep (>800 levels) nesting, which might explain why the register allocator gets swamped. This makes me suspect that factoring out readField would not really solve the issue, because we'd still be stuck with the deep nesting.

comment:4 Changed 16 months ago by tdammers

Some manual experimentation using hand-written Read instances suggests that factoring out readField would indeed cut down compile times significantly.

For a 10-field record type, the generated Read instance and a hand-written one with all the field parsers written inline both take 0.14 seconds to compile, and the register allocator shows up high in the profile. Manually rewriting the Read instance to use readField instead cuts it to 0.11 seconds, so I am now implementing this in TcGenDeriv to see how much of a difference it makes on larger record types.

comment:5 in reply to:  description Changed 16 months ago by tdammers

Replying to bgamari:

Let's factor this pattern out into a readField helper in GHC.Read,

readField :: String -> ReadPrec a -> ReadPrec a
readField fieldName readVal = do
    expectP  (Ident fieldName)
    expectP (Punc "=")
    readVal
{-# NOINLINE readField #-}

This will at least knock off a constant factor from the size of what should not be performance-critical code.

The readField function actually has to be a bit more complex than that to cater for symbol-named fields (e.g. (#)). We can however make the decision at compile time, because we already know the label.

comment:6 Changed 16 months ago by tdammers

http://git.haskell.org/ghc.git/commitdiff/a6dd03e751d17467be10eea3ff1b1773d8d35893 factors out the field reader into readField and readSymField (the latter being used for symbol-named fields).

Performance improvement is significant; before and after profiling output for a 500-field record example:

Before:

	Wed Oct 18 09:02 2017 Time and Allocation Profiling Report  (Final)

	   ghc-stage2 +RTS -p -h -RTS -B/home/tobias/well-typed/devel/ghc/inplace/lib -B/home/tobias/well-typed/devel/ghc/inplace/lib -fforce-recomp -c /home/tobias/Downloads/W3.hs

	total time  =       25.50 secs   (25505 ticks @ 1000 us, 1 processor)
	total alloc = 24,693,071,936 bytes  (excludes profiling overheads)

COST CENTRE        MODULE      SRC                                                 %time %alloc

RegAlloc-linear    AsmCodeGen  compiler/nativeGen/AsmCodeGen.hs:(658,27)-(660,55)   29.6   25.6
pprNativeCode      AsmCodeGen  compiler/nativeGen/AsmCodeGen.hs:(529,37)-(530,65)   16.9   22.2
regLiveness        AsmCodeGen  compiler/nativeGen/AsmCodeGen.hs:(591,17)-(593,52)    8.8    6.5
StgCmm             HscMain     compiler/main/HscMain.hs:(1426,13)-(1427,62)          8.7    8.3
NativeCodeGen      CodeOutput  compiler/main/CodeOutput.hs:171:18-78                 7.3    7.9
genMachCode        AsmCodeGen  compiler/nativeGen/AsmCodeGen.hs:(580,17)-(582,62)    6.0    6.3
SimplTopBinds      SimplCore   compiler/simplCore/SimplCore.hs:761:39-74             2.8    3.9
CoreTidy           HscMain     compiler/main/HscMain.hs:1253:27-67                   2.6    3.2
layoutStack        CmmPipeline compiler/cmm/CmmPipeline.hs:(97,13)-(99,40)           2.1    2.6
deSugar            HscMain     compiler/main/HscMain.hs:511:7-44                     2.0    2.7
CorePrep           HscMain     compiler/main/HscMain.hs:(1313,24)-(1314,57)          1.4    2.0
generateJumpTables AsmCodeGen  compiler/nativeGen/AsmCodeGen.hs:689:17-50            1.3    0.7
fixStgRegisters    AsmCodeGen  compiler/nativeGen/AsmCodeGen.hs:566:17-42            1.1    0.7
cmmToCmm           AsmCodeGen  compiler/nativeGen/AsmCodeGen.hs:571:17-50            1.1    1.1
OccAnal            SimplCore   compiler/simplCore/SimplCore.hs:(739,22)-(740,67)     1.1    1.1

After:

	Wed Oct 18 15:41 2017 Time and Allocation Profiling Report  (Final)

	   ghc-stage2 +RTS -p -h -RTS -B/home/tobias/well-typed/devel/ghc/inplace/lib -B/home/tobias/well-typed/devel/ghc/inplace/lib -fforce-recomp -c /home/tobias/Downloads/W3.hs

	total time  =       14.78 secs   (14784 ticks @ 1000 us, 1 processor)
	total alloc = 14,528,601,400 bytes  (excludes profiling overheads)

COST CENTRE        MODULE      SRC                                                 %time %alloc

RegAlloc-linear    AsmCodeGen  compiler/nativeGen/AsmCodeGen.hs:(658,27)-(660,55)   26.0   22.0
pprNativeCode      AsmCodeGen  compiler/nativeGen/AsmCodeGen.hs:(529,37)-(530,65)   14.7   19.2
StgCmm             HscMain     compiler/main/HscMain.hs:(1426,13)-(1427,62)          8.8    7.8
regLiveness        AsmCodeGen  compiler/nativeGen/AsmCodeGen.hs:(591,17)-(593,52)    7.8    5.7
NativeCodeGen      CodeOutput  compiler/main/CodeOutput.hs:171:18-78                 6.3    6.8
genMachCode        AsmCodeGen  compiler/nativeGen/AsmCodeGen.hs:(580,17)-(582,62)    5.3    5.5
SimplTopBinds      SimplCore   compiler/simplCore/SimplCore.hs:761:39-74             4.5    6.5
CoreTidy           HscMain     compiler/main/HscMain.hs:1253:27-67                   4.4    5.5
deSugar            HscMain     compiler/main/HscMain.hs:511:7-44                     3.4    4.5
CorePrep           HscMain     compiler/main/HscMain.hs:(1313,24)-(1314,57)          2.5    3.2
OccAnal            SimplCore   compiler/simplCore/SimplCore.hs:(739,22)-(740,67)     2.1    1.9
layoutStack        CmmPipeline compiler/cmm/CmmPipeline.hs:(97,13)-(99,40)           1.9    2.3
Stg2Stg            HscMain     compiler/main/HscMain.hs:1489:12-44                   1.4    1.0
generateJumpTables AsmCodeGen  compiler/nativeGen/AsmCodeGen.hs:689:17-50            1.2    0.6
fixStgRegisters    AsmCodeGen  compiler/nativeGen/AsmCodeGen.hs:566:17-42            1.1    0.6
tc_rn_src_decls    TcRnDriver  compiler/typecheck/TcRnDriver.hs:(493,4)-(555,7)      1.1    0.8
cmmToCmm           AsmCodeGen  compiler/nativeGen/AsmCodeGen.hs:571:17-50            1.0    1.0
occAnalBind.assoc  OccurAnal   compiler/simplCore/OccurAnal.hs:853:13-60             0.9    1.0

comment:7 Changed 16 months ago by bgamari

Quite a difference indeed. It's not every day you see a 40% reduction in compile time. Can you put the patch up for review?

comment:8 Changed 16 months ago by Ben Gamari <ben@…>

In dbd81f7/ghc:

Factor out readField (#14364)

Improves compiler performance of deriving Read instances, as suggested
in the issue.

Additionally, we introduce `readSymField`, a companion to `readField`
that parses symbol-type fields (where the field name is a symbol, e.g.
`(#)`, rather than an alphanumeric identifier. The decision between
these two functions is made a compile time, because we already know
which one we need based on the field name.

Reviewers: austin, hvr, bgamari, RyanGlScott

Reviewed By: bgamari

Subscribers: RyanGlScott, rwbarton, thomie

Differential Revision: https://phabricator.haskell.org/D4108

comment:9 Changed 16 months ago by bgamari

Resolution: fixed
Status: newclosed

Comment:8 nails this.

One additional change that we may investigate in the future is to emit Applicative parsers instead of using monadic bind. tdammers did some experiments and found that this helped compilation time quite dramatically in the unoptimized case although hurt when optimization was enabled.

Note: See TracTickets for help on using tickets.