wiki:Internships/JanStolarek

Version 42 (modified by jstolarek, 2 years ago) (diff)

--

Jan Stolarek's internship notes

Wise people say…

Geoffrey:

  • In the past, LLVM could not recognize all loops output by the LLVm back end as loops. Perhaps that has changed.
  • Answering the question "What does loopification do that isn't already being done?" would still be useful
  • So figuring out how to make LLVM recognize more loops would be good.
  • if you write a simple, tight loop in Haskell of the sort a C compiler would vectorize, will LLVm vectorize it? If not, why?

Austin Seipp:

  • i took the time to implement a half-assed almost-working loopification pass a few months ago. the sinking pass by Simon is what really does a huge amount of the optimizations Kryzsztof's thesis attacked differently. but i think doing loopification could maybe lead to identifying things like loop invariant expressions. it can't bootstrap the compiler with it (JS: it = Austin's patch). i think whenever i tie the knot in the new graph, i don't abandon parts of the old CmmNode, which then causes dead labels to hang around
  • oh, yeah, and as i noted in the commit message, you have to be careful when ordering those optimizations around. this is obviously only a valid transform pre-CPS. also you have to run a block elimination passes, otherwise things can happen where work can get duplicated into empty blocks
  • i think another problem is that the new codegen doesn't always discard empty basic blocks which pisses off the native code generator (see #7574 ) so we need a little refactoring to handle that correctly, too, by being able to do SCC passes from any particular node
  • i think that the fix for #7574 is probably pretty easily actually, it just requires shuffling things around. oh, and to be clear it's not *empty* basic blocks, it's *unreachable* basic blocks that make the codegen mad

Simon Marlow:

  • CmmSink removes dead assignments (though not in loops), which is why it's commented out. A single removeDeadAssigments pass costs about 5% of compilation time, and in the vast majority of code does nothing over what CmmSink already does.
  • PLEASE make sure that you're carefully measuring compilation time when making changes to the code generator. Expensive optimisations need to go in -O2 (at least).

Back-end notes

Various notes to self

  • Does it make sense to create a separate flag for every Cmm optimisation I add? After all they are designed to work together
  • I need to remember to cerfully choose at which optimization levels my Cmm passes are enabled
  • Here's an interesting bit from CoreToStg.lhs: "a dead variable's stack slot (if it has one): should be stubbed to avoid space leaks"

Loopification

  • tests that fail with panic on f56ed65 (branch js-loopification-v5, run with make EXTRA_HC_OPTS='-fcmm-loopify -fcmm-copy-propagation' WAY=normal):
   ../../libraries/base/tests             data-fixed-show-read [bad exit code] (normal)
   ../../libraries/base/tests             enum01 [bad exit code] (normal)
   ../../libraries/base/tests             enumRatio [bad exit code] (normal)
   ../../libraries/base/tests             memo001 [bad exit code] (normal)
   ../../libraries/base/tests             memo002 [bad exit code] (normal)
   ../../libraries/base/tests/Numeric     num007 [bad exit code] (normal)
   ../../libraries/hpc/tests/function     tough [bad stdout] (normal)
   ../../libraries/hpc/tests/function2    tough2 [bad stdout] (normal)
   ../../libraries/hpc/tests/simple/tixs  hpc_markup_001 [bad stdout] (normal)
   ../../libraries/random/tests           rangeTest [bad exit code] (normal)
   array/should_run                       arr012 [bad exit code] (normal)
   array/should_run                       arr013 [bad exit code] (normal)
   array/should_run                       arr018 [bad exit code] (normal)
   codeGen/should_run                     cgrun013 [bad stdout] (normal)
   codeGen/should_run                     cgrun016 [bad stderr] (normal)
   codeGen/should_run                     cgrun028 [bad exit code] (normal)
   codeGen/should_run                     cgrun034 [bad exit code] (normal)
   codeGen/should_run                     cgrun045 [bad stderr] (normal)
   codeGen/should_run                     cgrun047 [bad exit code] (normal)
   codeGen/should_run                     cgrun051 [bad stderr] (normal)
   codeGen/should_run                     cgrun059 [bad stderr] (normal)
   concurrent/should_run                  T4030 [bad exit code] (normal)
   concurrent/should_run                  conc021 [bad stderr] (normal)
   deSugar/should_run                     dsrun001 [bad exit code] (normal)
   deSugar/should_run                     dsrun016 [bad exit code] (normal)
   deSugar/should_run                     dsrun017 [bad exit code] (normal)
   deSugar/should_run                     dsrun018 [bad exit code] (normal)
   deSugar/should_run                     dsrun019 [bad exit code] (normal)
   deSugar/should_run                     dsrun020 [bad exit code] (normal)
   deSugar/should_run                     dsrun021 [bad exit code] (normal)
   deSugar/should_run                     dsrun022 [bad exit code] (normal)
   deSugar/should_run                     dsrun023 [bad exit code] (normal)
   deSugar/should_run                     mc01 [bad exit code] (normal)
   deSugar/should_run                     mc02 [bad exit code] (normal)
   deSugar/should_run                     mc03 [bad exit code] (normal)
   deSugar/should_run                     mc04 [bad exit code] (normal)
   deSugar/should_run                     mc05 [bad exit code] (normal)
   deSugar/should_run                     mc06 [bad exit code] (normal)
   deSugar/should_run                     mc07 [bad exit code] (normal)
   deSugar/should_run                     mc08 [bad exit code] (normal)
   deriving/should_run                    T2529 [bad exit code] (normal)
   deriving/should_run                    T5628 [bad stderr] (normal)
   deriving/should_run                    drvrun011 [bad exit code] (normal)
   ffi/should_run                         ffi008 [bad stderr] (normal)
   gadt                                   tc [bad exit code] (normal)
   ghc-api                                CmmCopyPropagationTest [bad stdout] (normal)
   ghc-api/T7478                          T7478 [bad exit code] (normal)
   ghci/linking                           ghcilink002 [bad exit code] (normal)
   ghci/linking                           ghcilink005 [bad exit code] (normal)
   ghci/scripts                           ghci024 [bad stdout] (normal)
   mdo/should_run                         mdorun002 [bad exit code] (normal)
   numeric/should_compile                 T7116 [bad stdout] (normal)
   numeric/should_run                     arith001 [bad exit code] (normal)
   numeric/should_run                     arith002 [bad exit code] (normal)
   numeric/should_run                     arith005 [bad exit code] (normal)
   numeric/should_run                     numrun012 [bad exit code] (normal)
   parser/should_run                      operator2 [bad exit code] (normal)
   perf/compiler                          T1969 [stat not good enough] (normal)
   perf/compiler                          T3064 [stat not good enough] (normal)
   perf/compiler                          T3294 [stat not good enough] (normal)
   perf/compiler                          T4801 [stat not good enough] (normal)
   perf/compiler                          T5030 [stat not good enough] (normal)
   perf/compiler                          T5321FD [stat not good enough] (normal)
   perf/compiler                          T5321Fun [stat not good enough] (normal)
   perf/compiler                          T5631 [stat not good enough] (normal)
   perf/compiler                          T5642 [stat not good enough] (normal)
   perf/compiler                          T5837 [stat not good enough] (normal)
   perf/compiler                          T783 [stat not good enough] (normal)
   perf/compiler                          parsing001 [stat not good enough] (normal)
   perf/should_run                        T2902 [bad stderr] (normal)
   perf/should_run                        T5237 [bad stdout] (normal)
   perf/should_run                        T5549 [stat too good] (normal)
   perf/should_run                        T7797 [stat too good] (normal)
   perf/should_run                        T7850 [stat too good] (normal)
   perf/should_run                        T876 [bad stdout] (normal)
   perf/should_run                        lazy-bs-alloc [stat too good] (normal)
   programs/andre_monad                   andre_monad [bad exit code] (normal)
   programs/cholewo-eval                  cholewo-eval [bad exit code] (normal)
   programs/cvh_unboxing                  cvh_unboxing [bad exit code] (normal)
   programs/joao-circular                 joao-circular [bad exit code] (normal)
   programs/jtod_circint                  jtod_circint [bad exit code] (normal)
   programs/north_array                   north_array [bad exit code] (normal)
   programs/sanders_array                 sanders_array [bad stdout] (normal)
   rebindable                             rebindable2 [bad exit code] (normal)
   rebindable                             rebindable3 [bad exit code] (normal)
   rebindable                             rebindable4 [bad exit code] (normal)
   rts                                    T7919 [exit code non-0] (normal)
   rts                                    exec_signals [bad exit code] (normal)
   rts                                    outofmem2 [bad stderr] (normal)
   rts                                    stack003 [bad exit code] (normal)
   safeHaskell/safeLanguage               SafeLang04 [bad exit code] (normal)
   safeHaskell/safeLanguage               SafeLang05 [bad exit code] (normal)
   safeHaskell/safeLanguage               SafeLang09 [bad stderr] (normal)
   simplCore/should_run                   T5587 [bad stderr] (normal)
   th                                     T3600 [exit code non-0] (normal)
   th                                     TH_repE2 [bad exit code] (normal)
   typecheck/should_compile               T4524 [exit code non-0] (normal)
   typecheck/should_run                   T1735 [bad exit code] (normal)
   typecheck/should_run                   tcrun003 [bad exit code] (normal)
   typecheck/should_run                   tcrun010 [bad exit code] (normal)

None of these seem to be directly related to loopification, except maybe for performance ones.

Let-no-escape notes

  • Code generation for let-no-escape: cgLneBinds in codeGen/StgCmmExpr.hs
  • Heap checking in let-no-escape: see Note [Heap checks] in codeGen/StgCmmHeap.hs
  • From codeGen/StgCmmMonad.hs:
    data CgLoc
      = CmmLoc CmmExpr        -- A stable CmmExpr; that is, one not mentioning
                            -- Hp, so that it remains valid across calls
    
      | LneLoc BlockId [LocalReg]             -- A join point
            -- A join point (= let-no-escape) should only.
            -- be tail-called, and in a saturated way.
            -- To tail-call it, assign to these locals,.
            -- and branch to the block id
    
  • Simon Marlow says: "[let-no-escape] catches more cases than just join points.  Any function or variable binding that does not escape is turned into let-no-escape."

Some interesting tickets

  • #605 - Optimisation: strict enumerations
  • #1498 - Optimisation: eliminate unnecessary heap check in recursive function.
  • #1600 - Optimisation: CPR the results of IO
  • #2289 - Needless reboxing of values when returning from a tight loop
  • #2387 - Optimizer misses unboxing opportunity
  • #4470 - Loop optimization: identical counters
  • #4937 - Remove indirections caused by sum types, such as Maybe
  • #5567 - LLVM: Improve alias analysis / performance BackEndNotes page has some discussion of this.
  • #7198 - New codegen more than doubles compile time of T3294
  • #7574 - Register allocator chokes on certain branches with literals (bug can be triggered with ./inplace/bin/ghc-stage2 -c -no-hs-main -fasm -O2 ./testsuite/tests/llvm/should_compile/T7571.cmm)
  • #8048 - Register spilling produces ineffecient/highly contending code

Notes on the wiki

Various clean-up tasks

Cmm clean-up

  • remove unused CmmRewriteAssignments
  • cmm/CmmLive.hs:106. This function is not used:
removeDeadAssignments :: DynFlags -> CmmGraph
                      -> UniqSM (CmmGraph, BlockEnv CmmLocalLive)

It is however referenced in some of the comments. I might be able to use it for my dead assignment removal. Simon PJ notes: ", we want to eliminate dead assignments to stack locations too, so the liveness info need to be augmented with stack areas. "

  • Cmm dumping could be improved. Right now it dumps all optimisation passes for one fragment of Cmm code, then for next fragment and so on. It would be more convinient to dump whole Cmm code after each pass. I'm not sure if that's possible with the current pipeline design. It seems that Stg->Cmm pass is intentionally design to produce Cmm code incrementally (via Stream) and I suspect that this might be the reason why the code is processed incrementally.
  • Simon M. says: The CmmSink pass before stack layout is disabled because I never got around to measuring it to determine whether it is a good idea or not. By all means do that!

Cleaning up the STG ->Cmm pass

When generating Cmm from STG there is some SRT information being generated. It is not used and has to be rebuilt anyway after converting to CPS Cmm. Below are some random notes and pieces of code that might related to this:

  • Cmm conversions in the compiler pipeline: main/HscMain.hs has tryNewCodeGen (l. 1300), which first calls StgCmm.codegen and then passes the generated Cmm to cmmPipeline function from cmm/CmmPipeline.hs. According to Austin Seipp cpsTop in cmm/CmmPipeline.hs takes care of converting to CPS: "yeah, CmmPipeline does take care of it. it's partially cpsTop that does it, and doSRTs elaborates the top-level info tables and stuff beyond that but mostly cpsTop. i think your general turning point is after the stack layout and stack pointer manifestation".

This code in cmm/Cmm.hs that might be relevant (or not):

-- (line 141 and onwards)
-- | Info table as a haskell data type
data CmmInfoTable
  = CmmInfoTable {
      cit_lbl  :: CLabel, -- Info table label
      cit_rep  :: SMRep,
      cit_prof :: ProfilingInfo,
      cit_srt  :: C_SRT
    }

data ProfilingInfo
  = NoProfilingInfo
  | ProfilingInfo [Word8] [Word8] -- closure_type, closure_desc

-- C_SRT is what StgSyn.SRT gets translated to...
-- we add a label for the table, and expect only the 'offset/length' form

data C_SRT = NoC_SRT
           | C_SRT !CLabel !WordOff !StgHalfWord {-bitmap or escape-}
           deriving (Eq)

needsSRT :: C_SRT -> Bool
needsSRT NoC_SRT       = False
needsSRT (C_SRT _ _ _) = True

Random code

  • main/HscMain.lhs:1300`. Is:
| otherwise
  = {-# SCC "cmmPipeline" #-}
    let initTopSRT = initUs_ us emptySRT in

    let run_pipeline topSRT cmmgroup = do
          (topSRT, cmmgroup) <- cmmPipeline hsc_env topSRT cmmgroup
          return (topSRT,cmmgroup)

    in do topSRT <- Stream.mapAccumL run_pipeline initTopSRT ppr_stream1
          Stream.yield (srtToData topSRT)

The <- / return sequence in the definition of run_pipeline can be eliminated, which allows to remove the do notation, which allows to do eta-reduction, which (finally) allows to remove the run_pipeline binding and using (cmmPipeline hsc_env) instead:

| otherwise
  = {-# SCC "cmmPipeline" #-}
    let initTopSRT = initUs_ us emptySRT
    in do topSRT <- Stream.mapAccumL (cmmPipeline hsc_env) initTopSRT ppr_stream1
          Stream.yield (srtToData topSRT)
  • cmm/CmmUtils.hs, function toBlockListEntryFirst - perhaps it would be safer to return a tuple in this case? This would probably make the invariant more explicit.

Wiki

  • NewCodeGenPipeline has some outdated sections in the Cmm pipeline description: Add spill/reload, Rewrite assignments. So far I only marked them as OUTDATED
  • NewCodeGenModules - mostly outdated. Mentioned data types and modules no longer exist.

Various stuff

Tickets that I could potentially look into:

  • #3070 - floor(0/0) should not be defined
  • #3676 - realToFrac doesn't sanely convert between floating types
  • #3744 - Comparisons against minBound/maxBound not optimised
  • #4101 - Primitive constant unfolding
  • #5615 - ghc produces poor code for div with constant powers of 2.
  • #7116 - Missing optimisation: strength reduction of floating-point multiplication
  • #7858 - Fix definitions of abs/signum for Floats/Doubles.
  • #8072 - Optimizations change result of div for Word

Other things to do:

  • investigate opportunities for improving heap checks. An idea: if a worker knows its heap requirements it could pass them to the caller, thus avoiding the heap check. A question: how much time do we really spend on heap checks?

Some LLVM notes that may be useful:

Github repos

Unboxed Booleans (#6135) work is in all 8 repos on branch bool-primops-vX, where X is a number. X is increased after rebasing on top of new HEAD (I'm doing this to avoid upstream rebasing).

Loopification work is in main GHC repo on branch js-loopification-vX.