Add saturated constructor applications to Core

changed weight to 5

added Tfeature request Trac import labels

What is special about a data constructor that allows this, that a function (like, say, flip) does not have?

Replying to [ticket:12618#comment:125082 nomeata]:

What is special about a data constructor that allows this, that a function (like, say, flip) does not have?

(Moved to #12626)

I'd like to chime in and agree that it would be a good design to reflect saturated function application and related info into a corresponding multi arg saturated app application form. I've some woke in progress type theory developments critically use having a built in motion of simultaneous arguments and results (a la types are calling conventions or sequent core), where the logical strength of what can be described needs a built in motion of simultaneous arguments.

I presume / assume this multi arg app form is essentially a function application against an unboxed tuple? Right?

One thing I'd like to point out is that a knock on effect of this change you may want to consider is having Unboxed tuples in arg and return positions act like pi and sigma telescopes respectively.

If what I'm sketching out needs more clarity, I'm happy to do a clearer exposition on the wiki or the like

If the meat of the ideas discussed / articulated are consistent with / facilitated by the details I'm hopefully articulating clearly, this would be a set of changes I would strongly support. In. Fact I could probably get support for putting work time into helping out on this change if need be. Partly because then certain ideas I would like to experimentally add to ghc would then be much much easier to add :) (The simultaneous arguments stuff makes embedding / supporting linear logical stuff in a clean way much nicer than previous efforts p)

changed the description

There are two different things that are orthogonal:

Enforcing saturated function calls (types are calling convention etc.), which this ticket is not about about.
Here, we want to avoid storing redundant types arguments.

As this can be implemented with pattern synonyms, this should be a semantically transparent change. The former is also interesting (and there is some simplicity to be gained by having both, as the arity of a function application would be determined by the function’s type), but let’s keep them separate for now.

(Moved to #12626)

Let's not go overboard here! This ticket is only about treating saturated applications of data constructors specially.

There are two more ambitious possible

Try to suppress more type arguments, for situations other than saturated data constructors. If you are interested in that, please read Scrap your type applications. By all means come up with a better scheme, but that paper describes the best one I know.
Introduce uncurried application as a Core primitive (and eliminate App). For that we'd want uncurried lambda as well (the intro form). Please read Types are calling conventions.

I talked with Stephanie and Joachim about this at ICFP, and I think Joachim is going to follow it up. It too involves complications (notably abstraction must be over a telescope), and we had an alternative idea with "computation types". More on that anon, doubtless.

By all means start new tickets to discuss these generalisations. But this ticket is just about the intro-form that is dual to case, namely saturated constructor application. If we discuss the (much more ambitious) generalisations here, the payload of this ticket will get buried.

(One reason for NOT adopting ConApp is that it might ultimately be subsumed by the more general cases. But I'm not holding my breath.)

I exactly want those unboxed tuple telescopes :) Fair enough I'll see if I can put together an exposition that cleans up tacc and articulates some changes which make it nicer

mentioned in issue #12626

I guess I did get a bit overly excited, on the way back from ICFP. I moved all my comments about generalizing this to arbitrary applications in an transparent way to #12626. But I still wonder what’s so special about data constructors, and why whatever works for data constructors does not work in general. I skimmed the paper, and will read it more carefully again now.

We went back and forth with something like this in Sequent Core, where having the dual of Case was nice. One downside hasn't been mentioned, though: We'll need to use smart constructors more consistently, or otherwise not be able to count on all saturated constructor applications to use ConApp. Currently there's mkCoreApp and mkCoreApps, but those are only necessary when the let/app invariant must be enforced; IIRC, lots of places where let/app is known to hold just use fold App over the arguments.

simonpj says:

A simple once-and-for-all analysis on the DataCon will establish how to do the matching, which type args to retain, etc.

So a DataCon will have this info stored in it? It might be non-trivial! For example:

data T a where
  MkT :: F a -> T a

If F is not injective, we would need to store the choice for a. Even if it is injective, it may be more convenient to store the choice for a.

And then there are examples like

data T2 a where
  MkT2 :: Maybe (Either Bool a -> a) -> T2 a

where the relationship between the constructor field's type and the choice for a is non-trivial. However, perhaps a use of tcMatchTy or one of its friends when constructing the DataCon is enough to sort this out.

If we successfully do this for data constructors, it should not be hard to do the same for poly-kinded type constructors. I'm specifically thinking about the redundant RuntimeRep arguments to unboxed-tuple type constructors.

mentioned in issue #12635

assigned to @nomeata

JFTR, I’m working on implementing this. Not sure if one week is enough, there seems to be an endless supplies of code paths that have a catch-all pattern match on CoreExpr and thus are not found by the compiler.

My work is in wip/T12618. Stage 1 compiles, and seems to work mostly, but if I build ghc-stage2 with it, ghc-stage2 crashes with an internal error: evacuate(static): strange closure type 0. If someone enjoys debugging these kind of problems, let me know...

It seems I get internal error: evacuate(static): strange closure type 0 only with a dynamically built GHC, not with a statically built (as it is the case on Travis). If that rings a bell with someone that could save me further debugging work, I’d be grateful.

While I make progress with getting the tree to compiler properly again, here is one question that will need to be answered.

Consider this rule:

"foldr/id" foldr (:) [] = \x  -> x

Because we desugar constructor in source to the wrapper (especially if they are unsaturated), but the wrapper is a function that will be marked as inlineable, the compiler now gives this error message:

libraries/base/GHC/Base.hs:855:1: warning: [-Winline-rule-shadowing]
    Rule "foldr/id" may never fire
      because ‘GHC.Types.$W:’ might inline first
Probable fix: add an INLINE[n] or NOINLINE[n] pragma for ‘GHC.Types.$W:’

So at first I thought: Ok, no problem, I just force the inlining of data con wrappers after the desugaring of rule left-hand sides, and this might work for [], but (:) is really used unsaturated here.

What is the best way forward here?

One way would be to disable this warning specifically for datacon workers, and then make the rule matcher smart enough to match both variants.

Or alternatively, make the warning aware that an unsaturated use of a function with an unfolding will not inline, and it is thus ok to have something INLINE on the LHS of a rule, as long as it is unsaturated.

Actually this is already a problem today. It's just rendered more prominent now that even (:) has a wrapper. Consider

data T = MkT {-# UNPACK #-} !Int

{-# RULES

"fT" f MkT = True
"gT" forall x. g (MkT x) = x
  #-}

f :: (Int -> T) -> Bool
{-# NOINLINE f #-}
f x = True

g :: T -> Int
{-# NOINLINE g #-}
g (MkT x) = x+1

yields

Foo.hs:9:1: warning: [-Winline-rule-shadowing]
    Rule "fT" may never fire because 'Foo.$WMkT' might inline first
    Probable fix: add an INLINE[n] or NOINLINE[n] pragma for 'Foo.$WMkT'

Foo.hs:10:1: warning: [-Winline-rule-shadowing]
    Rule "gT" may never fire because 'Foo.$WMkT' might inline first
    Probable fix: add an INLINE[n] or NOINLINE[n] pragma for 'Foo.$WMkT'

What to do? If we are to match these rules, we really must delay inlining the wrapper for MkT (after inlining we get a mess of unboxing etc). So either we must allow you to add a NOINLINE pragma to MkT; or we must add one automatically (e.g. NOINLINE [1]).

Delaying all consructor-wrapper inlining to phase 1 is potentially quite drastic, because case-of-known-constructor wouldn't happen until the wrappers are inlined. Maybe that's ok; I'm not sure. Worth trying I think.

Well, the whole point of this ticket is to have ConApp as soon as possible, and nesting tuples with $W(,) will have again the quadratic cost until we get rid of them, so I am not convinced. Also it feels wrong to fight against the inliner here…

Would it be wrong for GHC to look a that rule, notice that something marked as INLINE occurs on the LHS, but then notice that it is used unsaturated, hence conclude that it will not have been inlined in the program where the rule needs to be matched, and omit the warning?

mentioned in issue #12689

I have opened #12689 for the issue of rules vs. data con wrappers, as it is a separate one.

Plan for #12689: Inline simple wrappers (like the ones that we add here) in the LHS of rules, so that are no worse off than before, and then figure out if we are actually getting the desired improvements with regard to nested tuples (re #5642).

Small update: I have a branch implementing non-compressing ConApp that validates and with no significant effect on the program runtimes. And already now, two perf test cases (#9961 (closed) and #9233 (closed)) improved! So there might be something in here for us.

Cool! I hope to dig into what's been afoot here at hac phi !

@nomeata, how much did the benchmarks improve, exactly?

Replying to [ticket:12618#comment:126416 osa1]:

@nomeata, how much did the benchmarks improve, exactly?

See https://perf.haskell.org/ghc/#revision/1c4c64385bbc315deaff203fbebc423ce79f3f93:

9961 improves by 13%, 9233 by 5.5%.

13% is great. Can Gipeda show residency too? IIRC at some of the perf tests were causing a lot of trouble not because of allocations but because of residency.

Can Gipeda show residency too?

It could, but does not now. Isn’t residency far too flaky and dependent on flags? I’m very warily of introducing noise from not very helpful tests.

Allright, it is done, and I can report back.

Introducing ConApp (without any compression) was quite tedious, and for me, rather two than one week. By now, it validates (almost; the GHCi debugger shows some difference in behaviour that I did not investigate) and shows the same runtime performance.

The most tricky points are related to rules, which really do not like it if eta-expansion changes the Core too much. It took me a while to get equivalent program output after this refactoring (and I took a shortcut for now, duplicating some list fusion rules that match build (:) to have a second variant matching build (\x y -> x : y)). With a bit more careful work, this could be fixed, should we want this code to be merged.

This change on its own affected some compiler perf benchmarks in the test suite: T9961 improves by 13%, T9233 by 2.5%., T9020 by 2.5%. T4801 regresses by 3.87% (bytes allocated)

I then, in a separate patch, implemented omitting redundant type arguments from constructors such as Just, (:) and tuples, which was the main motivation here. At every use of ConApp, I tried to understand the code as to whether it actually cares about the type arguments (which means that the they need to be recovered) or not (in which case the compressed argument list can be traversed, which is of course more efficient).

In these places I had to uncompress the type arguments:

freeVars (which also calculates the types)
The linter
cpeRhsE
exprIsConApp_maybe
exprType
collectStaticPtrSatArgs, sptModuleInitCode (all about static pointer tables)
decomposeRuleLhs in the desugarer
toIfaceExpr when serializing tuples (low-hanging fruit here)
match_magicDict
occAnal
simplExprF1
isValue is specConstr

I found that it is crucial to analyze the type of the data constructor only once, and store the “compression scheme” (i.e. which type arguments to recover form which term arguments) once in the data constructor, as this analysis is not completely cheap.

But even with this optimization in place, the effect of this is – neglectible.

My gut feeling:

ConApp is not a good idea in this form. Constructor applications are still just applications, and treating them that separately is not going to be healthy. It might be a better idea to make all applications saturated (as in strict core, or less invasive, “spotty types”).
The compression scheme is nice in principle, but there are still too many code paths that want the types. Some might be taken off the list after careful analysis of the code and mild refactoring. In others, making sure that the type in a Type data constructor is used as lazily as possible might help avoiding actually running exprType (this is a blind guess).
Furthermore, the large types that occur with nested tuples are already in the type checker! So avoiding them in Core is only half the story.
If compression would make a difference, then I think we want it at all applications (or at least applications headed by an Id, where we could store the compression scheme). Another point in favor of making all applications saturated.

As for the problem of nested tuples: Maybe it would have been better to first carefully analyze the compiler (using -v/profiling/ticky-ticky) to be sure where we pay the unwanted cost (type checking, Core, interfaces, somewhere else) to know what we have to fix, before having a shot at one assumed cause.

The code is at D2564.

I did a full implementation of System IF some years ago, and concluded (like you) that the pain is not worth the gain. Your point that the type checker is building these very big types in the nested-tuple case is an excellent one.

I don't have any brilliant ideas, I'm afraid. But this is an excellent data point.

Is Phab a good place to keep the patch long-term, or would it be best pushed into the GHC repo?

Good work, even if disappointing!

Simon

added Pnormal label

mentioned in issue #17223

mentioned in issue #19704

Add saturated constructor applications to Core

Child items 0

Activity