During this week's GHC call it came up that it would be nice to have a linting mechanism for verifying that we only enter single-entry thunks once, as currently this invariant appears to be completely unchecked.
Currently we do not record whether a thunk is intended to be single-entry when we emit its code. Moreover, I don't believe there is any room for this left in the info table to record this fact. There are two ways I can think of accomodating this,
stealing a bit from the thunk type field
add a flags field to StgDebugInfo such that the lint requires the debug way
As far as implementing the check itself, I think it should be rather straightforward. When we enter a thunk we simply want to check whether it is single-entry. If it is then we replace it with a special type of BLACKHOLE-like thunk which crashes the program (or merely emits a warning) on entry.
Trac metadata
Trac field
Value
Version
7.10.1
Type
FeatureRequest
TypeOfFailure
OtherFailure
Priority
normal
Resolution
Unresolved
Component
Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
simonmar, simonpj
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related or that one is blocking others.
Learn more.
Simonmar, could you confirm that there are no blatant inaccuracies in this description? If not, do you have an opinion on which approach is preferable?
For what it's worth, while investigating #10414 (closed), I ran the whole test suite with -feager-blackholing, which (at the time) caused all single-entry thunks to be overwritten with a black hole on entry. There were no test failures, which gives me a reasonably high degree of confidence that the "single-entry thunks are entered only once" invariant is maintained, outside of the situations involving multiple threads entering a closure simultaneously discussed in #10414 (closed).
This is me thinking out aloud about this feature. Ignore me if you want, but if you are curious, feel free to jump in.
Widening the scope of this ticket a bit (or shifting it?): Wouldn’t it be nice if we not only link the single-entry thunks, but also find out about missed opportunities for single-entry thunks? Such a counting would subsume the linting above.
I’m considering to build on [Debugging/TickyTicky ticky-ticky debugging], as it already provides a lot of the necessary infrastructure. In particular, it counts entries to thunks, when -ticky-dyn-thunk is enabled. The number is not quite what we want, though: It seems that if a certain static thunk is created n times, then it will report n entries; whereas in the scope of this ticket we want to know that each allocated thunk was entered once.
Also, currently a dynamic (i.e. an instance of a static) thunk that is not marked as single-entry will always be reported to be entered 0 or 1 times, as the result is shared. How do we get better numbers here? One way might be a special heap object, let’s call it a “counting indirection”. It behaves mostly like a regular indirection, i.e. it’s entry code would count a tick and then jump to the entry code of the indirectee, but the crucial difference would be that the garbage collector will 'not' erase it when it evacuates memory. It will, however, evacuate the indirectee as usual. This way, the sharing semantics are preserved, but every entry via the thunk will be counted, not just the first.
These counting indirections would be created by the code that creates a thunk. This way, this counting can be done independent of -ticky-dyn-thunk. These heap objects can also carry additional data used in counting, e.g. the name of the thunk (in CorePrep names) and whether GHC created this thunk as single-entry or not (which would not affect the behaviour of the counting indirection, but be useful for the reporting).
I’m not sure yet what’s the best data structure to do the counting, if we indeed want to count uses of each dynamic instance of the thunk (or rather, of the counting indirection) separately. Maybe statically allocate 'm' (maybe 1024) counters per thunk, and an index, and the first 'm' instances of the thunk use the these counter (each, upon first entry, incrementing the index to allocate its slot), and if there are more than 1024 entries, then simply stop counting.
I kind-of like this design, but I’m sure I’m missing some rather ugly details.
It seems that if a certain static thunk is created n times, then it will report n entries; whereas in the scope of this ticket we want to know that each allocated thunk was entered once.
Yes exactly. Let's call the code that allocates the thunk the thunk allocation site. For some of these sites, cardinality analysis may say "this is a single-entry thunk". Call those the single-entry thunk allocation sites' and the others the multi-entry thunk allocation sites.
For each single-entry thunk allocation site, we'd like to know if any of the thunks it allocates are entered more than once. (This would mean that the analysis was unsound.)
For each multi-entry allocation site, we'd like to know if all the thunks it allocates are entered at most once. This is a possible missed opportunity to make it a single-entry site. (Only "possible" because in a different run it might be entered more than once.)
For all allocation sites we'd like to know if every thunk allocated there was never entered; that's a possible missed opportunity for absence.
A counting indirection sounds like the right kind of thing.
I would also like to count lambdas! So for the STG binding
f = \xy. e
How many times do I call f (ie enter the closure)? That is, is it a one-shot lambda.
Do I ever partially apply it, or is it always saturated?
A counting indirection sounds like the right kind of thing.
Looking a bit into this IND_COUNTING now; so I’ll go into “dumping notes into trac” mode, for the future benefit of me or whoever else continues.
There is already a closure type IND_PERM which is not removed by the GC; our counting indirections requires the same behavior. So this is a good start, I guess. The following comments are observations from looking at every mention of IND_PERM in the code.
Cmm.h has macros to ENTER a closure, which shortcuts indirections without calling their entry code. This is not what we want for an IND_COUNTING; so we should use the default code there, to actually enter the counting closure.
Should the interpreter also count entries? I don’t see any reference to ticky in Interpreter.c, so probably not.
Should rts/Stable.c preserve IND_COUNTING? It does not preserve IND_PERM, so probably not.
Fun fact: tricky and profiling do currently not mix. As I plan to do this counting as part of ticky, this means I can ignore problems with profiling in the RTS (the closures still have to work, of course).
scavenge_mark_stack does nothing for IND_PERM with a comment I don’t fully understand. Will revisit.
Weird. I only find lots of code that handlesIND_PERM, but none that actually creates closures of that type. Is all that dead code, likely since 2010’s reimplementation of BLACHOLES by Simon Marlow? (changeset:5d52d9b6/ghc)
Why not emit some update code into a single entry thunk that updates the thunk with something that will explode if it is entered again? That seems a lot easier than allocating indirctions.
Why not emit some update code into a single entry thunk that updates the thunk with something that will explode if it is entered again? That seems a lot easier than allocating indirctions.
Right, that would be sufficient to detect thunks that are wrongly marked single-entry. But we also want to learn about thunks that are not marked single-entry, but should be.
Right, that would be sufficient to detect thunks that are wrongly marked single-entry. But we also want to learn about thunks that are not marked single-entry, but should be.
Do you expect to learn much by doing that? We know the analysis is inaccurate, and I expect you'll just find a lot of thunks that are used a long way from the definition site.
Yes we expect to learn a lot. From a scientific point of view (and writing the journal paper) I'm very interested to know how many of the single-entry thunks are found by the analysis. Let's say there are N thunk allocation sites in the code. We find that 15% are single-entry. If we found that only 20% were entered once (that 15% plus another 5%) that would mean there was at most another 5% that a more sophisticated analysis might find. But the number might be much larger.
If the number is large, we might start to characterise the missed opportunities further. Perhaps some are being missed because of a bug in the analysis. Others might be caught by a simple improvement in the analysis.
I have made initial, still hackish, progress. After a few iterations I can now run all of nofib with my counting indirections and it does not segfault any more (phew).
Here is a report from integrate, note the new columns #Alloc, Single and Multpile.
Upon entry, this increments the private counter, and if it does so the first time, increases Single, and on the second time decreases Single and increases Multiple. Otherwise, it passes R1 onto the indirectee, preserving the tag.
Nooo, where is the longish text I wrote here yesterday? :-( I hate it when that happens. Darn, let’s see what I can remember. grnrl.
I thought I only enabled it for thunks, but it also seems to be present for two functions, and only there count multiple entries... Weird.
Still weird. If I add my counting indirection to everything, I get segfaults. So I guard it with idRepArity id == 0 for now, thinking I would only add it to thunks. But the above report shows some that have “Non-void arguments”, but still idRepArity id == 0; this is one example:
let { sat_s7uy [Occ=Once] :: GHC.Types.Double -> GHC.Types.Double [LclId, Str=DmdType] = \r srt:SRT:[] [x_s7ux] GHC.Float.timesDouble x_s7ux y_s7uu;} in case Main.$wintegrate1D 0.0## ww3_s7uw sat_s7uy of ww4_s7uz { __DEFAULT -> GHC.Types.D# [ww4_s7uz]; };
Why does this have a zero idRepArity? Why does it not crash? Is it because it is used as an argument to a function, and hence no fast calls occur?
Also, these 500000-times allocated thunks are not really unused, are they?.
No, they are not. These are constructors, where entries are not counted. I extended the ticky report to be a bit more explicit about the kind of things these names are, i.e. distinguish cons from thunks, and for thunks, indicate which are one-shot and which are static.
Two total entries for a thunk (sat_s7vH) that has been allocated once and called once? Maybe there was a heap check inbetween?
No, this is simply a bug in the ticky code, where there are two calls to tickyEnterThunk
(likely introduced by 99d4e5b4, possibly
due to a merge mistake).
The latter two improvements are at D2014, mostly to get phab to test the commit (which does not seem to happen... why?).
I now need to wrap my head around how to do the counting for functions. My counting indirection does not work as it is, because functions are entered differently. So I either have to create a counting indirection that works for functions (but I don’t know yet if a single one would work for all types of functions), or add the counting code to the function itself. In this case, the function closure needs an additional field for the counter.
If you want to count how many times a function is entered, why not just put a ticky counter at the start of the function's code?
The problem is that I need to count each dynamic instance separately. So I need a separate counter for each allocated closure, so the only place it can go is inside the closer as another non-pointer argument.
(For static functions, things are easier of course.)
Oh ok, well, same deal. When you enter a (dynamically allocated) function there's a register pointing to the function closure. So the ticky code at the start of the function's code can bump the count in ClosurePtr[4] (or whatever the appropriate offset is. No?
Yes! That’s the idea, and conceptually straight-forward. I’m just a bit worried that the change will be too invasive to the compiler’s code (info tables, offset calculation etc.)
But I guess I should just do it, and see how bad it is. In the worst case we keep it on a separate branch just for the investigation for the paper.
Ok, I implemented it, and nofib still runs (yay). The code is yet unrepresentable, though :-)
Preliminary observations (gotta run now):
In all of nofib, there is one single entry thunk that is entered multiple times (something in gamteb, allocated 2048 times, each time with two entries). Did not investigate yet, though.
On first glance, plenty of thunks which appear to be single-entry, but are not marked as such.