Offer a shorthand for `--skip=_build/stage$n/compiler/.dependencies.mk`

changed weight to 5

Let me also add @goldfire here who was asking similar questions in this ticket: #16242 (closed)

When working on a stage1 compiler, the slightest change in any of the files leads to rebuilding the dependency matrix, which takes 20-30s time. That makes for a very disruptive edit-compile cycle.

This is mysterious to me. I thought that part of the wonderfulness of Shake and early cut-off was that all this repeated work is not done.

So why is that not working?

Simon: this is expected behaviour in this case. Dependency analysis of Haskell sources is performed on per package (not per file) basis, in one go. Whenever a single source file is changed, we invoke ghc -M on the whole package, which can take a while for a large package. After this step, the early cutoff kicks in and we stop.

Per-package dependency analysis is how Make works too, but Make often disables the tracking mechanism, which may lead to incorrect build results but is fast.

Perhaps, we could/should switch to dependency analysis on the per-file basis (as we do with C sources), which would directly address this particular ticket without introducing yet another way to disable tracking.

Looking at the documentation of ghc -M, it looks like the current per-package approach is due to a limitation of GHC's dependency analysis. Quoting from https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/separate_compilation.html#makefile-dependencies

In general, ghc -M Foo does the following. For each module M in the set Foo plus all its imports (transitively), it adds to the Makefile [...]

That is, GHC always does **transitive** dependency analysis, which means invoking it separately on each file would be rather inefficient (each time it will likely traverse almost the whole dependency graph). This is why Make and Hadrian choose to perform the analysis just once but for the whole package.

Perhaps, it's not too difficult to add a more fine-grain dependency analysis to GHC, i.e. produce only the list of immediate dependencies of a specified module.

Perhaps, it's not too difficult to add a more fine-grain dependency analysis to GHC, i.e. produce only the list of immediate dependencies of a specified module.

I'm sure it would be hard to have another flag so that ghc -new-flag M.hs would produce just the immediate dependencies of M. Would that solve the problem?

Would that solve the problem?

Yes, switching to per-file dependencies in Hadrian would be easy if we had such flag.

However, per-package vs per-file is a bit of a trade-off. Per-package analysis will likely be faster for the full build (you do analysis only once instead of for each file separately), whereas per-file analysis will be faster for incremental builds. (It is likely that the reduction of performance for the full build when switching to the per-file approach will be negligible, but we'll need to check this.)

Speaking of the trade-off, perhaps, this is a nice use case for Shake's "batching" feature:

http://hackage.haskell.org/package/shake-0.17.4/docs/Development-Shake.html#v:batch

We could have a batching rule for dependency analysis: if multiple files need to be analysed, their analysis could be combined into a single GHC invocation. That would be quite cool.

Yes, switching to per-file dependencies in Hadrian would be easy if we had such flag.

OK, let's do it! Can't be hard.

Andrey, is the reason this is hideously expensive because ghc -M is hideously expensive? If not, then using oracles to cache the various parts of dependencies.mk would be the right solution.

I believe the way -M works is it builds a complete dependency tree, which is pretty expensive, and requires running all C preprocessors etc. Doing it in individual steps is likely to be hideously expensive.

The solution I've always used in the past is something like https://shakebuild.com/includes#generated-transitive-imports. Pro's are it's super fast, super granular and allows you to import files that are themselves generated on demand. Con is that you have to write your own "spot an import" code. My experience is that's really hard in general but quite easy for any specific project with sane conventions.

Can you describe more explicitly "the solution you have used in the past" in our context?

I think you are saying

Implement ghc -scome-new-flag M.hs which runs CPP on M.hs (if necessary), parses the result in some simple minded way, and spits out all of M's direct imports.

This seems to be what your usedHeaders thing does.

If we could do need (usedHeaders "M.hs") maybe we would never need to use ghc -M at all?

In the past I've written a function that reads the file, and using fairly simplistic string matching guesses what it depends on, in the build system itself. It can avoid shelling out to GHC (hugely expensive on Windows, especially with corporate antivirus systems) and avoid running CPP. Generally most CPP doesn't impact which files are used, and even if it does, having a superset isn't a problem.

The kind of function I've used previously is on the order of:

[... extract_the_module_name x ... | x <- lines src, "import " `isPrefixOf` x]

(You also need to match #include and chase those down, but using the link in the previous comment, that mostly works for free.)

Andrey, is the reason this is hideously expensive because ghc -M is hideously expensive?

There are ~500 Haskell files in the compiler directory, so global (i.e. per-package) dependency analysis can't be very fast, however efficiently it is implemented.

If not, then using oracles to cache the various parts of dependencies.mk would be the right solution.

This is what we do, but this doesn't solve the problem: right now, if you edit a single Haskell file in compiler, we will rerun ghc -M on the whole set of ~500 package files. Yes, oracles will helpfully cut the changes from propagating further, but this single ghc -M invocation will be slow.

I think the only solution is to have a way (e.g. a new GHC flag) to run dependency analysis on a single file, without transitive exploration of all its dependencies.

Neil: Doing conservative dependency analysis directly from within Hadrian is an option too!

I think the only solution is to have a way (e.g. a new GHC flag) to run dependency analysis on a single file, without transitive exploration of all its dependencies.

That sounds reasonable. It would certainly be more robust than the strategy I describe. It won't help if you have deeply nested CPP includes (you'd still rescan them each time), but I suspect that's negligible for GHC.

As you say, if that flag can take multiple files at once, you could batch it, which would be a good performance improvement.

I can give a shot at implementing this -M-on-a-diet flag idea, so as to then use this when appropriate in Hadrian, whenever this would save us work.

Self-note: the relevant code lives in https://gitlab.haskell.org/ghc/ghc/blob/master/compiler/main/DriverMkDepend.hs. The simplest approach is probably to refine that GhcMode to be either transitive or not. When transitive, we'd take the current code path, when not, we'd take the new one that just looks at and reports the immediate dependencies. The former would still be exposed under -M and the new one under some other flag (any suggestion is welcome).

Thanks taking this up, Alp!

We could name the new flag -M1 -- this would suggest that only dependencies at "depth" 1 are reported.

mentioned in issue #15938 (closed)

marked this issue as related to #16242 (closed)

Seeing that this lost a little steam, would it be possible to add a flag like --no-rebuild-deps that will skip rebuilding *.mk files?

@sgraf812 Can you use --skip=//*.mk instead? It should work already and is even shorter!

P.S.: Thanks for finding this thread! @alp and I were looking for it recently :)

@snowleopard --skip='//*.mk' seems to work just fine, thanks!

added Pnormal label

Trac field	Value
Version	8.6.3
Type	Task
TypeOfFailure	OtherFailure
Priority	normal
Resolution	Unresolved
Component	Build System (Hadrian)
Test case
Differential revisions
BlockedBy
Related
Blocking
CC	alpmestan, snowleopard
Operating system
Architecture

Offer a shorthand for `--skip=_build/stage$n/compiler/.dependencies.mk`

Child items 0

Activity

Offer a shorthand for `--skip=_build/stage$n/compiler/.dependencies.mk`

Relates to

Activity