GHC includes two types of profiling: cost-centre profiling and ticky-ticky profiling. Additionally, HPC code coverage is not "technically" profiling, but it uses a lot of the same mechanisms as cost-centre profiling (you can read more about it at Commentary/Hpc).
Cost-centre profiling operates at something close to the source level, and ticky-ticky profiling operates at something much closer to the machine level. This means that the two types of profiling are useful for different tasks. Ticky-ticky profiling is mainly meant for compiler implementors, and cost-centre profiling for mortals. However, because cost-centre profiling operates at a high level, it can be difficult (if not impossible) to use it to profile optimized code. Personally, I (Kirsten) have had a lot of success using cost-centre profiling to find problems that were due to my own bad algorithms, but less success once I was fairly sure that I wasn't doing anything obviously stupid and was trying to figure out why my code didn't get optimized as well as it could have been.
You can't use cost-centre profiling and ticky-ticky profiling at the same time; in the past, this was because ticky-ticky profiling relied on a different closure layout, but now that's no longer the case. You probably can't use both at the same time as it is unless you wanted to modify the build system to allow using way=p and way=t at the same time to build the RTS. I haven't thought about whether it would make sense to use both at the same time.
Cost-center profiling in GHC, e.g. of SCCs, consists of the following components:
- Data-structures for representing cost-centres in compiler/profiling/CostCentre.lhs.
- Front-end support in compiler/deSugar/DsExpr.lhs, for converting SCC pragma into the Tick constructor in Core.
- Modifications to optimization behavior in compiler/coreSyn/CoreUtils.lhs and compiler/coreSyn/CorePrep.lhs to prevent optimizations which would result in misleading profile information. Most of this is to handle the fact that SCCs also count entries (tickishCounts, also applies to Commentary/Hpc); otherwise the only relevant optimization is avoiding floating expressions out of SCCs. Note that the simplifier also has "ticks" (so it can decide when to stop optimizing); these are not the same thing at all.
- The StgSCC constructor in STG, and code generation for it compiler/codeGen/StgCmmProf.hs
- A pass over STG in compiler/profiling/SCCfinal.lhs to collect cost centres so that they can be statically declared by compiler/profiling/ProfInit.hs, and add extra SCCs in the case of -fprof-auto; see also compiler/profiling/NOTES
- Code-generation for setting labels found in compiler/codeGen/StgCmmProf.hs, in particular saving and restoring CC labels and well as counting ticks; note that cost-centres even get their own constructor in C-- as CC_Labels (cost-centre labels).
- Runtime support for initializing and manipulating the actual runtime CostCentre structs which store information, in rts/Profiling.c; headers are located in includes/rts/prof/CCS.h
Ticky-ticky profiling is very simple (conceptually): instrument the C code generated by GHC with a lot of extra code that updates counters when various (supposedly) interesting things happen, and generate a report giving the values of the counters when your program terminates. GHC does this instrumentation for you when you compile your program with a special flag. Then, you use another flag to tell the RTS to generate the profiling report.
You might want to use ticky-ticky profiling for one of the following two reasons:
- You are an implementor trying to understand the effect of an optimization in GHC more precisely.
- You are a user trying to observe the behavior of your programs with optimization turned on. GHC doesn't do certain transformations in the presence of cost centres, so cost-centre profiling can be less than accurate if you're trying to understand what really happens when you're compiling with -O.
I won't necessarily try to argue that ticky-ticky is useful at all for the second group of people, but it's better than nothing, and perhaps the ticky-ticky data could be used to build a better profiler.
For more info, including HOWTO details, see Debugging/TickyTicky.