Opened 19 months ago

Last modified 8 weeks ago

#14003 new feature request

Allow more worker arguments in SpecConstr

Reported by: choenerzs Owned by: choenerzs
Priority: normal Milestone: 8.10.1
Component: Compiler Version: 8.2.1-rc3
Keywords: JoinPoints, Fusion Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: #11565 Differential Rev(s):
Wiki Page:

Description

Starting with GHC 8.2 (rc1 -- head) I noticed that the SpecConstr pass does not always optimize completely with SpecConstr-heavy code. Setting -fmax-worker-args=100 leads to complete specialization again.

However, given that code annotated with SPEC should be optimized until no more SPEC arguments are alive, shouldn't callToNewPats in compiler/specialise/SpecConstr.hs specialize *irrespective* of the size of the worker argument list?

Code that actually fails to specialize is fairly large, hence no test case -- though I have some files with core output showing insufficient specialization.

(I'd be willing to write a patch for this)

Attachments (2)

v1-slow.dump-simpl (15.7 KB) - added by choenerzs 19 months ago.
core output with default W/W 10
v1-fast.dump-simpl (13.4 KB) - added by choenerzs 19 months ago.
core output with W/W 100

Download all attachments as: .zip

Change History (15)

comment:1 Changed 19 months ago by bgamari

In general your code will perform pretty poorly with or without worker/wrapper if you have >100 arguments as this is well beyond what we can accommodate in physical registers. Of course, I'm happy to be proven wrong: do you have a concrete example of a program with this many arguments which you think should optimize well?

comment:2 Changed 19 months ago by choenerzs

I'm using a lot of intermediate data constructors for control flow which are all eliminated if SpecConstr runs to completion.

The 100 for W/W was only to provide a ceiling I'm not likely to bump against. However, this is part of the problem, in that the user of the library now needs to think about which value to give W/W.

The function wstream_Strng2_V_1 becomes more than x10 faster if specconstr runs to completion and allocates *a lot* less (in my allocation benchmark it is 48 bytes versus ~ 600 Kilo-bytes . It has arity 18 after full SpecConstr.

In GHC 8.0.2 and earlier, I did not have to change the maximal number of W/W arguments for SpecConstr.

The core output for v1-slow (W/W = 10) and v1-fast (W/W = 100) is given as attachments.

Changed 19 months ago by choenerzs

Attachment: v1-slow.dump-simpl added

core output with default W/W 10

Changed 19 months ago by choenerzs

Attachment: v1-fast.dump-simpl added

core output with W/W 100

comment:3 Changed 19 months ago by simonpj

I'm confused.

  • I think -fmax-worker-args applies to the demand analyser, not SpecConstr. So I'm not sure how it is affecting things.
  • Are you getting the same code with 8.0.2 and 8.2 after SpecConstr?

Overall it's going to be hard to help without a test case.

comment:4 Changed 19 months ago by choenerzs

SpecConstr checks via isWorkerSmallEnough in callsToNewPats if the pattern to be specialised is small enough. This is controlled by -fmax-worker-args. Line 1913 in SpecConstr.hs.

And no, 8.0.2 fully specialises irrespective of the number of worker arguments. Specialisation in 8.2 depends on the number of arguments the pattern to be specialised has.

I'll try to construct a test case.

comment:5 Changed 19 months ago by bgamari

The function wstream_Strng2_V_1 becomes more than x10 faster if specconstr runs to completion and allocates *a lot* less (in my allocation benchmark it is 48 bytes versus ~ 600 Kilo-bytes . It has arity 18 after full SpecConstr.

Where is this wstream_Strng2_V_1 function that you refer to? It would be helpful to see a concrete example.

And no, 8.0.2 fully specialises irrespective of the number of worker arguments. Specialisation in 8.2 depends on the number of arguments the pattern to be specialised has.

This change is due to two causes:

Perhaps you want the SpecConstr limit to be independent from the worker/wrapper limit?

comment:6 Changed 19 months ago by choenerzs

The core output of the wstream_Strng2_V_1 function is in the two attached files. I still need to prepare a test case that doesn't depend on a lot of dependencies.

What I want is SpecConstr to continue specialising functions that have a SPEC argument irrespective of the number of arguments.

That is, those where sc_force is set, I think?

comment:7 Changed 19 months ago by simonpj

That seems plausible. We could not call isWorkerSmallEnough if sc_force is true. Would you like to try that?

If so, we should update this comnent

ForceSpecConstr arguments are spotted in scExpr' and scTopBinds which then set
sc_force to True when calling specLoop. This flag does four things:
  * Ignore specConstrThreshold, to specialise functions of arbitrary size
        (see scTopBind)
  * Ignore specConstrCount, to make arbitrary numbers of specialisations
        (see specialise)
  * Specialise even for arguments that are not scrutinised in the loop
        (see argToPat; Trac #4488)
  * Only specialise on recursive types a finite number of times
        (see is_too_recursive; Trac #5550; Note [Limit recursive specialisation])

Now there'd be five things!

comment:8 Changed 19 months ago by choenerzs

Ok! I've HEAD prepared, and will start with simplified example code.

comment:9 Changed 17 months ago by bgamari

Milestone: 8.2.28.4.1
Owner: set to choenerzs

Any progress on this, choenerzs?

Bumping off to 8.4.1 as there is a workaround for the regression (e.g. increase -fmax-worker-args).

comment:10 Changed 17 months ago by choenerzs

Hi Ben,

sorry for the delay. Work caught up with me. I think, this month should work out.

Best, Christian

comment:11 Changed 13 months ago by bgamari

Milestone: 8.4.18.6.1

This ticket won't be resolved in 8.4; remilestoning for 8.6. Do holler if you are affected by this or would otherwise like to work on it.

comment:12 Changed 8 months ago by bgamari

Milestone: 8.6.18.8.1

These won't be addressed by GHC 8.6.

comment:13 Changed 8 weeks ago by osa1

Milestone: 8.8.18.10.1

Bumping milestones of low-priority tickets.

Note: See TracTickets for help on using tickets.