Tuple constraint synonym led to asymptotic performance lossage
I encountered a situation where using a tuple constraint synonym
produced an asymptotic runtime performance degradation, from linear to
quadratic time. The enclosed Bug.hs reproduces the potentially
relevant features of the problem. The function test
there is
quadratic, but should be linear. The file also contains a definition
of test2
, which differs only in eschewing the constraint synonym,
and is linear, as it should be.
In more detail: the program tries to count to 20,000 in a rather
contrived way. It maintains a Box, which holds the current count and
a function that can accept an increment and a count to produce a new
count (namely, (+)
). The function do_step
lifts incrementation to
Boxes, using their contained function to update their stored count.
The first tricky thing is that the incrementing function stored in the
Box is polymorphic in both arguments separately: it can accept
increments that are not the same type as the stored count (and
converts them). The second tricky thing is that I don't want to
expose the type variable for the increment type (as opposed to the
count type) to clients of the Box, so the Box's incrementing function
is stored polymorphic (with an explicit forall). Finally, I want to
impose the constraint (Fractional num, Real num)
on allowable
increments (even though this example only needs Real num
).
This constellation of desiderata conspires to produce quadratic
performance: doubling the parameter to test
(but not test2
)
multiplies its runtime by 4. Inspecting the heap profile indicates a
growing accumulation of closures when running test
(but not
test2
). Inspecting the generated stg (enclosed) locates a potential
culprit: apparently when do_step
(but not do_step2
) reconstructs
the Box, it changes the func
stored there to be a fresh closure that
destructures the Numerical
constraint tuple, constructs a fresh one,
and calls the function that was in that slot previously with it. I
hypothesize that this accumulates as a chain that performs a linear
number of such destructurings and restructurings at each increment,
leading to the observed quadratic runtime and linear memory growth.
Incidentally, I checked whether record wildcards were the issue (no:
test4
) and whether updating just the obj
field solves it (yes:
test3
). Sadly, the latter solution is not directly usable in my
actual application because of "Record update for insufficiently
polymorphic field".
For reference: I compiled with ghc -O2 Bug.hs -fprof-auto -rtsopts -ddump-to-file -ddump-simpl -ddump-st