The notion I think dfeuer was trying to get at there was that replicateM can do 'peasant multiplication' like (^) does to get away with O(log n) distinct calls to (<*>).
The notion I think dfeuer was trying to get at there was that replicateM can do 'peasant multiplication' like (^) does to get away with O(log n) distinct calls to (<*>).
Don't give me so much credit; I was just thinking about eliminating the need for list fusion rules to fire to prevent GHC from building an intermediate list. The peasant multiplication idea is quite an interesting one. It would sometimes be good, and I don't think it could ever be terrible, but it certainly could be worse in some cases. I'd probably prefer separate combinators for fast exponentiation of Monoids and Applicatives.
e.g. We carefully use the doubling scheme mentioned above to handle things replicateM and the like in Data.Sequence and it makes a night and day difference in the performance of many combinators for generating those structures.
My concern is that it will add that tiny factor to tight loops in IO and ST where every tiny little bit counts. Admittedly, the fact that I can't use an STRef s Int counter and get GHC to put it in a register rules out the use of replicateM_ for a significant class of such loops, but I suspect my point is still valid.
Data.Sequence is a bit weird. I doubt too many things offer a Wasserman special like that replicateA. Do you know how the doubling approach plays with branch prediction?
I doubt too many things offer a Wasserman special like that replicateA.
Consumption of the result with a Writer, Const, Reader, update monads, all benefit rather disproportionally. The Const case in particular is relevant as it basically shows how we'd foldMap and it and the writer case bring in every Monoid as something that can be folded over optimally. In general the idea of randomly inducing a bias here ensure that nothing smart can be done by the consumer. You enforce worst-case asymptotic behavior by losing sharing.
One option to compromise might be to look at a way that RULES can try out the biased association and then write back if it doesn't fuse away?
Or even simpler, add RULES to rewrite the IO / ST cases into the biased form. We provably get no benefit from the tree for that style of effect and can't observe the difference in associativity for these concrete cases.
As you note basically any transformer is going to have enough overhead to swamp this concern, so it is really only those sorts of things that are problematic.
I was too tired when I wrote that remark, so I forgot about transformers. StateT u (ST s) actually is a perfectly beautiful target for replicateM_. With the current definition, that *can* be unboxed, allowing for some serious vector crunching. I think your version should be in the library; I just think it should have a new name.
I would be worried about not just IO but also some kind of structure that is intended to be produced and consumed in a streaming fashion, in O(1) space. Seems like that could blow up space usage arbitrarily with a repeated-squaring implementation.
Including separate versions that use repeated squaring would make more sense (if we should include them at all; the use cases seem obscure and anyone can (already) just provide them in a package outside base).