When working on a stage1 compiler, the slightest change in any of the files leads to rebuilding the dependency matrix, which takes 20-30s time. That makes for a very disruptive edit-compile cycle.
Alp helped me on #ghc and found --skip=_build/stage0/compiler/.dependencies.mk as the right flag to skip dependency rebuilding. I wonder if could hide that behind a nicer flag? I think this should do similar things as --freeze1, only that we 'freeze' stage 0 and dependency building.
The analogy is that we need a hadrian equivalent of make -C ghc 1 as we have --freeze1 for make -C ghc 2.
Trac metadata
Trac field
Value
Version
8.6.3
Type
Task
TypeOfFailure
OtherFailure
Priority
normal
Resolution
Unresolved
Component
Build System (Hadrian)
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
alpmestan, snowleopard
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
When working on a stage1 compiler, the slightest change in any of the files leads to rebuilding the dependency matrix, which takes 20-30s time. That makes for a very disruptive edit-compile cycle.
This is mysterious to me. I thought that part of the wonderfulness of Shake and early cut-off was that all this repeated work is not done.
Simon: this is expected behaviour in this case. Dependency analysis of Haskell sources is performed on per package (not per file) basis, in one go. Whenever a single source file is changed, we invoke ghc -M on the whole package, which can take a while for a large package. After this step, the early cutoff kicks in and we stop.
Per-package dependency analysis is how Make works too, but Make often disables the tracking mechanism, which may lead to incorrect build results but is fast.
Perhaps, we could/should switch to dependency analysis on the per-file basis (as we do with C sources), which would directly address this particular ticket without introducing yet another way to disable tracking.
In general, ghc -M Foo does the following. For each module M in the set Foo plus all its imports (transitively), it adds to the Makefile [...]
That is, GHC always does **transitive** dependency analysis, which means invoking it separately on each file would be rather inefficient (each time it will likely traverse almost the whole dependency graph). This is why Make and Hadrian choose to perform the analysis just once but for the whole package.
Perhaps, it's not too difficult to add a more fine-grain dependency analysis to GHC, i.e. produce only the list of immediate dependencies of a specified module.
Perhaps, it's not too difficult to add a more fine-grain dependency analysis to GHC, i.e. produce only the list of immediate dependencies of a specified module.
I'm sure it would be hard to have another flag so that ghc -new-flag M.hs would produce just the immediate dependencies of M. Would that solve the problem?
Yes, switching to per-file dependencies in Hadrian would be easy if we had such flag.
However, per-package vs per-file is a bit of a trade-off. Per-package analysis will likely be faster for the full build (you do analysis only once instead of for each file separately), whereas per-file analysis will be faster for incremental builds. (It is likely that the reduction of performance for the full build when switching to the per-file approach will be negligible, but we'll need to check this.)
We could have a batching rule for dependency analysis: if multiple files need to be analysed, their analysis could be combined into a single GHC invocation. That would be quite cool.
Andrey, is the reason this is hideously expensive because ghc -M is hideously expensive? If not, then using oracles to cache the various parts of dependencies.mk would be the right solution.
I believe the way -M works is it builds a complete dependency tree, which is pretty expensive, and requires running all C preprocessors etc. Doing it in individual steps is likely to be hideously expensive.
The solution I've always used in the past is something like https://shakebuild.com/includes#generated-transitive-imports. Pro's are it's super fast, super granular and allows you to import files that are themselves generated on demand. Con is that you have to write your own "spot an import" code. My experience is that's really hard in general but quite easy for any specific project with sane conventions.
Can you describe more explicitly "the solution you have used in the past" in our context?
I think you are saying
Implement ghc -scome-new-flag M.hs which runs CPP on M.hs (if necessary), parses the result in some simple minded way, and spits out all of M's direct imports.
This seems to be what your usedHeaders thing does.
If we could do need (usedHeaders "M.hs") maybe we would never need to use ghc -M at all?
In the past I've written a function that reads the file, and using fairly simplistic string matching guesses what it depends on, in the build system itself. It can avoid shelling out to GHC (hugely expensive on Windows, especially with corporate antivirus systems) and avoid running CPP. Generally most CPP doesn't impact which files are used, and even if it does, having a superset isn't a problem.
The kind of function I've used previously is on the order of:
Andrey, is the reason this is hideously expensive because ghc -M is hideously expensive?
There are ~500 Haskell files in the compiler directory, so global (i.e. per-package) dependency analysis can't be very fast, however efficiently it is implemented.
If not, then using oracles to cache the various parts of dependencies.mk would be the right solution.
This is what we do, but this doesn't solve the problem: right now, if you edit a single Haskell file in compiler, we will rerun ghc -M on the whole set of ~500 package files. Yes, oracles will helpfully cut the changes from propagating further, but this single ghc -M invocation will be slow.
I think the only solution is to have a way (e.g. a new GHC flag) to run dependency analysis on a single file, without transitive exploration of all its dependencies.
I think the only solution is to have a way (e.g. a new GHC flag) to run dependency analysis on a single file, without transitive exploration of all its dependencies.
That sounds reasonable. It would certainly be more robust than the strategy I describe. It won't help if you have deeply nested CPP includes (you'd still rescan them each time), but I suspect that's negligible for GHC.
As you say, if that flag can take multiple files at once, you could batch it, which would be a good performance improvement.
Self-note: the relevant code lives in https://gitlab.haskell.org/ghc/ghc/blob/master/compiler/main/DriverMkDepend.hs. The simplest approach is probably to refine that GhcMode to be either transitive or not. When transitive, we'd take the current code path, when not, we'd take the new one that just looks at and reports the immediate dependencies. The former would still be exposed under -M and the new one under some other flag (any suggestion is welcome).