Opened 19 months ago
Last modified 8 weeks ago
#14003 new feature request
Allow more worker arguments in SpecConstr
Reported by: | choenerzs | Owned by: | choenerzs |
---|---|---|---|
Priority: | normal | Milestone: | 8.10.1 |
Component: | Compiler | Version: | 8.2.1-rc3 |
Keywords: | JoinPoints, Fusion | Cc: | |
Operating System: | Unknown/Multiple | Architecture: | Unknown/Multiple |
Type of failure: | Runtime performance bug | Test Case: | |
Blocked By: | Blocking: | ||
Related Tickets: | #11565 | Differential Rev(s): | |
Wiki Page: |
Description
Starting with GHC 8.2 (rc1 -- head) I noticed that the SpecConstr pass does not always optimize completely with SpecConstr-heavy code.
Setting -fmax-worker-args=100
leads to complete specialization again.
However, given that code annotated with SPEC
should be optimized until no more
SPEC
arguments are alive, shouldn't
callToNewPats
in
compiler/specialise/SpecConstr.hs
specialize *irrespective* of the size of the worker argument list?
Code that actually fails to specialize is fairly large, hence no test case -- though I have some files with core output showing insufficient specialization.
(I'd be willing to write a patch for this)
Attachments (2)
Change History (15)
comment:1 Changed 19 months ago by
comment:2 Changed 19 months ago by
I'm using a lot of intermediate data constructors for control flow which are all eliminated if SpecConstr runs to completion.
The 100
for W/W was only to provide a ceiling I'm not likely to bump against. However, this is part of the problem, in that the user of the library now needs to think about which value to give W/W.
The function wstream_Strng2_V_1 becomes more than x10 faster if specconstr runs to completion and allocates *a lot* less (in my allocation benchmark it is 48 bytes versus ~ 600 Kilo-bytes . It has arity 18 after full SpecConstr.
In GHC 8.0.2 and earlier, I did not have to change the maximal number of W/W arguments for SpecConstr.
The core output for v1-slow (W/W = 10) and v1-fast (W/W = 100) is given as attachments.
comment:3 Changed 19 months ago by
I'm confused.
- I think
-fmax-worker-args
applies to the demand analyser, notSpecConstr
. So I'm not sure how it is affecting things.
- Are you getting the same code with 8.0.2 and 8.2 after
SpecConstr
?
Overall it's going to be hard to help without a test case.
comment:4 Changed 19 months ago by
SpecConstr checks via isWorkerSmallEnough
in
callsToNewPats
if the pattern to be specialised is small enough. This is controlled by
-fmax-worker-args
. Line 1913 in SpecConstr.hs.
And no, 8.0.2 fully specialises irrespective of the number of worker arguments. Specialisation in 8.2 depends on the number of arguments the pattern to be specialised has.
I'll try to construct a test case.
comment:5 Changed 19 months ago by
Related Tickets: | → #11565 |
---|
The function
wstream_Strng2_V_1
becomes more than x10 faster if specconstr runs to completion and allocates *a lot* less (in my allocation benchmark it is 48 bytes versus ~ 600 Kilo-bytes . It has arity 18 after full SpecConstr.
Where is this wstream_Strng2_V_1
function that you refer to? It would be helpful to see a concrete example.
And no, 8.0.2 fully specialises irrespective of the number of worker arguments. Specialisation in 8.2 depends on the number of arguments the pattern to be specialised has.
This change is due to two causes:
- a48de37dcca98e7d477040b0ed298bcd1b3ab303, which fixes #11565 where
-fmax-worker-args
was simply ignored - f93c363fab1ac8ce6f0b474f5967b0b097995827, which extended the reach of
-fmax-worker-args
to include SpecConstr in order to improve compiler performance.
Perhaps you want the SpecConstr limit to be independent from the worker/wrapper limit?
comment:6 Changed 19 months ago by
The core output of the wstream_Strng2_V_1
function is in the two attached files. I still need to prepare a test case that doesn't depend on a lot of dependencies.
What I want is SpecConstr to continue specialising functions that have a SPEC
argument irrespective of the number of arguments.
That is, those where sc_force is set, I think?
comment:7 Changed 19 months ago by
That seems plausible. We could not call isWorkerSmallEnough
if sc_force
is true. Would you like to try that?
If so, we should update this comnent
ForceSpecConstr arguments are spotted in scExpr' and scTopBinds which then set sc_force to True when calling specLoop. This flag does four things: * Ignore specConstrThreshold, to specialise functions of arbitrary size (see scTopBind) * Ignore specConstrCount, to make arbitrary numbers of specialisations (see specialise) * Specialise even for arguments that are not scrutinised in the loop (see argToPat; Trac #4488) * Only specialise on recursive types a finite number of times (see is_too_recursive; Trac #5550; Note [Limit recursive specialisation])
Now there'd be five things!
comment:8 Changed 19 months ago by
Ok! I've HEAD prepared, and will start with simplified example code.
comment:9 Changed 17 months ago by
Milestone: | 8.2.2 → 8.4.1 |
---|---|
Owner: | set to choenerzs |
Any progress on this, choenerzs?
Bumping off to 8.4.1 as there is a workaround for the regression (e.g. increase -fmax-worker-args
).
comment:10 Changed 17 months ago by
Hi Ben,
sorry for the delay. Work caught up with me. I think, this month should work out.
Best, Christian
comment:11 Changed 13 months ago by
Milestone: | 8.4.1 → 8.6.1 |
---|
This ticket won't be resolved in 8.4; remilestoning for 8.6. Do holler if you are affected by this or would otherwise like to work on it.
comment:13 Changed 8 weeks ago by
Milestone: | 8.8.1 → 8.10.1 |
---|
Bumping milestones of low-priority tickets.
In general your code will perform pretty poorly with or without worker/wrapper if you have >100 arguments as this is well beyond what we can accommodate in physical registers. Of course, I'm happy to be proven wrong: do you have a concrete example of a program with this many arguments which you think should optimize well?