Opened 9 years ago

Closed 9 years ago

#2670 closed bug (fixed)

Record selectors behaving badly wrt optimisation

Reported by: simonmar Owned by: igloo
Priority: normal Milestone: 6.10.2
Component: Compiler Version: 6.8.3
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

The attached program performs much worse with HEAD as of today (and presumably also 6.10.1) than 6.8.3.

> ghc-6.8.3 --make -O2 bug.hs
[1 of 1] Compiling Main             ( bug.hs, bug.o )
Linking bug ...
> time ./bug
1.17s real   1.17s user   0.01s system   99% ./bug
> ghc-6.11 --make -O2 bug.hs 
[1 of 1] Compiling Main             ( bug.hs, bug.o )
Linking bug ...
> time ./bug                    
1.58s real   1.58s user   0.01s system   99% ./bug

The problem appears to be in the code for delete, GHC has duplicated two calls to next.

Also note this is a version of the code we have in eyeball/IOList.lhs which illustrates some other problems in the current optimiser (although the other problems are not regressions, which is why I'm highlighting this one).

Attachments (1)

bug.hs (2.6 KB) - added by simonmar 9 years ago.

Download all attachments as: .zip

Change History (5)

Changed 9 years ago by simonmar

Attachment: bug.hs added

comment:1 Changed 9 years ago by simonpj

Diagnosis first:

  • Record selector applications are always considered cheap (see exprIsCheap) hence duplication of next.
  • It's not clear why CSE doesn't eliminate the nested case.
    case (next x) of r { .... (case (next x) or r' { ... } .... }
    
  • Why isn't next inlined? (In the UNPACK case it actually constructs a pair, so inlining it would eliminate that allocation. Reason: there's an error branch (error "Cant select next from Nil"), which has a big static code size. This is silly: the error branch should be floated out, and is in the optimised code for the selector. But record selectors have a built-in unfolding so we don't get the benefit.
  • Even if we do inline it, all those duplicated error applications are a waste of space.
  • Perhaps the result-discount thing in CoreUnfold should not be discombobulated by the error branch?
  • We tried writing the record selectors as ordinary functions, and that fixed the problem. More precisely, 6.10 then allocated less heap (by about 10% or so), but strangely was a bit slower in real time. We don't know why.

Conclusion: record selectors should probably become ordinary functions. But watch out for consequences for dictionary record selectors.

comment:2 Changed 9 years ago by simonpj

Summary: Performance regressionRecord selectors behaving badly wrt optimisation

Just adding a couple more notes:

  • When field packing/unpacking is involved, if the selector is not inlined bodily (perhaps it is big), then we'd like strictness and CPR info; and indeed a worker/wrapper split. Again that argues for making them ordinary functions.
  • Injecting the selectors into the CoreBinds early (ie at the start of the optimisation passes) doesn't help much. Since they are GlobalIds, any uses of the selectors still don't "see" the bindings, instead using the unfolding put inside the selector by MkId.

Simon

comment:3 Changed 9 years ago by simonpj

Owner: changed from simonpj to igloo

I've finally fixed this

Fri Jan  2 14:28:51 GMT 2009  simonpj@microsoft.com
  * Make record selectors into ordinary functions
  
  This biggish patch addresses Trac #2670.  The main effect is to make
  record selectors into ordinary functions, whose unfoldings appear in
  interface files, in contrast to their previous existence as magic
  "implicit Ids".  This means that the usual machinery of optimisation,
  analysis, and inlining applies to them, which was failing before when
  the selector was somewhat complicated.  (Which it can be when
  strictness annotations, unboxing annotations, and GADTs are involved.)
  
  The change involves the following points
  
  * Changes in Var.lhs to the representation of Var.  Now a LocalId can
    have an IdDetails as well as a GlobalId.  In particular, the
    information that an Id is a record selector is kept in the
    IdDetails.  While compiling the current module, the record selector
    *must* be a LocalId, so that it participates properly in compilation
    (free variables etc).
  
    This led me to change the (hidden) representation of Var, so that there
    is now only one constructor for Id, not two.
  
  * The IdDetails is persisted into interface files, so that an
    importing module can see which Ids are records selectors.
  
  * In TcTyClDecls, we generate the record-selector bindings in renamed,
    but not typechecked form.  In this way, we can get the typechecker
    to add all the types and so on, which is jolly helpful especially
    when GADTs or type families are involved.  Just like derived
    instance declarations.
  
    This is the big new chunk of 180 lines of code (much of which is
    commentary).  A call to the same function, mkAuxBinds, is needed in
    TcInstDcls for associated types.
  
  * The typechecker therefore has to pin the correct IdDetails on to 
    the record selector, when it typechecks it.  There was a neat way
    to do this, by adding a new sort of signature to HsBinds.Sig, namely
    IdSig.  This contains an Id (with the correct Name, Type, and IdDetails);
    the type checker uses it as the binder for the final binding.  This
    worked out rather easily.
  
  * Record selectors are no longer "implicit ids", which entails changes to
       IfaceSyn.ifaceDeclSubBndrs
       HscTypes.implicitTyThings
       TidyPgm.getImplicitBinds
    (These three functions must agree.)
  
  * MkId.mkRecordSelectorId is deleted entirely, some 300+ lines (incl
    comments) of very error prone code.  Happy days.
  
  * A TyCon no longer contains the list of record selectors: 
    algTcSelIds is gone
  
  The renamer is unaffected, including the way that import and export of
  record selectors is handled.
  
  Other small things
  
  * IfaceSyn.ifaceDeclSubBndrs had a fragile test for whether a data
    constructor had a wrapper.  I've replaced that with an explicit flag
    in the interface file. More robust I hope.
  
  * I renamed isIdVar to isId, which touched a few otherwise-unrelated files.
  
  

    M ./compiler/basicTypes/DataCon.lhs -4 +5
    M ./compiler/basicTypes/Id.lhs -72 +45
    M ./compiler/basicTypes/IdInfo.lhs -32 +28
    M ./compiler/basicTypes/IdInfo.lhs-boot -4 +3
    M ./compiler/basicTypes/MkId.lhs -401 +85
    M ./compiler/basicTypes/Var.lhs -113 +93
    M ./compiler/coreSyn/CoreSyn.lhs -6 +6
    M ./compiler/coreSyn/CoreUnfold.lhs -2 +2
    M ./compiler/coreSyn/CoreUtils.lhs -5 +5
    M ./compiler/coreSyn/MkExternalCore.lhs -1 +1
    M ./compiler/coreSyn/PprCore.lhs -5 +7
    M ./compiler/ghci/Debugger.hs -3 +2
    M ./compiler/hsSyn/HsBinds.lhs -4 +20
    M ./compiler/iface/BinIface.hs -17 +28
    M ./compiler/iface/BuildTyCl.lhs -21 +20
    M ./compiler/iface/IfaceSyn.lhs -25 +38
    M ./compiler/iface/LoadIface.lhs -22 +1
    M ./compiler/iface/MkIface.lhs -3 +12
    M ./compiler/iface/TcIface.lhs -42 +98
    M ./compiler/main/HscTypes.lhs -7 +6
    M ./compiler/main/InteractiveEval.hs -5 +3
    M ./compiler/main/TidyPgm.lhs -28 +20
    M ./compiler/prelude/TysWiredIn.lhs -1
    M ./compiler/rename/RnBinds.lhs +2
    M ./compiler/simplCore/CSE.lhs -1 +1
    M ./compiler/simplCore/FloatIn.lhs -1 +1
    M ./compiler/simplCore/SetLevels.lhs -9 +9
    M ./compiler/simplCore/SimplCore.lhs -1 +1
    M ./compiler/stgSyn/CoreToStg.lhs -1 +1
    M ./compiler/stranal/DmdAnal.lhs -2 +2
    M ./compiler/stranal/WorkWrap.lhs -1 +1
    M ./compiler/stranal/WwLib.lhs -4 +4
    M ./compiler/typecheck/TcBinds.lhs -6 +9
    M ./compiler/typecheck/TcEnv.lhs -5 +2
    M ./compiler/typecheck/TcInstDcls.lhs -9 +8
    M ./compiler/typecheck/TcRnDriver.lhs -28 +35
    M ./compiler/typecheck/TcSplice.lhs -1 +1
    M ./compiler/typecheck/TcTyClsDecls.lhs -19 +190
    M ./compiler/types/TyCon.lhs -12 +2
    M ./compiler/vectorise/VectUtils.hs -2 +2

I think this is too far-reaching to merge to the branch.

I'm re-assinging to Ian, in the hope that you can make a test that'll spot the performance change. But if that's too hard, just close it.

Simon

comment:4 Changed 9 years ago by igloo

Resolution: fixed
Status: newclosed

We decided that a test is too hard, so closing.

Note: See TracTickets for help on using tickets.