We'd like to try using the stream-fusion idea of Don Stewart, Duncan Coutts and Roman Leshchinskiy, and replace the (somewhat fragile) foldr/build stuff.
It's not done yet. There's a paper on it but we still have issues with complex list comprehensions.
The new streams stuff will be included in the new bytestring-1.0 package though (as you know, the base package is being split up and ByteString will be in its own package).
We should also rerun the benchmarks in the light of the changes that made it into head since April when the fusion library was last benchmarked. There's no that much more to do here, to finish the job.
I'm interested in pursuing this. What obstacles are present? Perhaps most importantly, what changes need to be made to make this work? Is it just GHC.List, Data.List, etcetera that need to be changed?
AFAIK this ticket is no closer to being practical now that when dons added that comment 18 months ago. To implement it you need to change:
The list libraries, to use Stream based definitions
The desugaring for list comprehensions in GHC
However, it is still the case that noone knows how to implement concat / concatMap efficiently and without improving GHCs Core optimisers. Since concatMap is crucial to the the desugaring of list comprehensions, and also kind of important to users generally, stream fusion isn't really ready for prime time yet, IMHO.
An alternative to coming up with a clever library modification would be to improve GHCs optimiser. I found that the suggested approach of running a SAT pass in a fixed point loop with constructor specialisation was fairly reliable, but I haven't got around to either committing my improved SAT pass or making the fixed point I used not so much of a hack. Even if I did so, it is upsetting that stream fusion requires so much more work on the part of the optimiser to get right, compared to the fairly simply rewrite rule story with foldr/build.
(Both approaches suffer from fragility in the face of a lack of INLINE pragmas).
For those playing along at home, more details on the inability to efficiently optimize concat/concatMap (more generally, fusion on nested lists) can be found in Section 7.2 of the "Stream Fusion" paper. T'would probably be nice if the GHC optimizer experiments were public somewhere.
AFAIK this ticket is no closer to being practical now that when dons added that comment 18 months ago.[...]
However, it is still the case that noone knows how to implement concat / concatMap efficiently and without improving GHCs Core optimisers.[...]
An alternative to coming up with a clever library modification would be to improve GHCs optimiser.
[...I]t is upsetting that stream fusion requires so much more work on the part of the optimiser to get right, compared to the fairly simply rewrite rule story with foldr/build.
I wanted to focus on this part of the comment, which is about whether you actually want to merge the code. If this is agreed upon, then one needs to find somebody willing to finish the job, but that's easier with an agreement.
The paper argues (or claims) that the improved optimization is generally useful and not specific to stream fusion. Do you disagree? Is the code exceedingly ugly? Would you be ready to publish it somewhere? Are there any benchmark results from applying the optimization alone, compared with the standard optimizer? The paper mentions shortly that usually improves nofib performance slightly.
I also wonder: is sometimes the SAT transformation done by hand by programmers for performance?
On my journey through fusion land, I fell in love with the idea of stream fusion and spent quite some time thinking about the concatMap problem.
Apart from the HERMIT approach based on rewriting a (albeit large) subset of concatMap calls, I think the solution lies in call-pattern specialisation of function arguments, as already recognised in section 6.2 of the original paper on call-pattern specialisation.
The paper goes on to argue that it's too brittle to have rules matching on lambda terms. But then again, I don't see why we even need a RULE matching on a specific lambda term (or even terms with exprArity > 0), as these things are pretty much unique everytime they occur. E.g., having a RULE deduplicate specialisations would probably not help much anyway. So, if we would employ some other mechanism than rewrite rules to specialise call sites, as I understand SpecConstr currently does, I think we would be fine, wouldn't we?
At least this could get rid of the concatMap roadblock, are there any others I'm not aware of?
if we would employ some other mechanism than rewrite rules to specialise call sites
I don't know what you have in mind here. Would you care to elaborate?
Right, that's quite a bummer. I'll probably have to develop a deeper understanding of how SpecConstr and the rewrite rule mechanism works before I can answer this.
If anybody is unaware and interested: the latest work on the topic is "Stream Fusion, to Completeness" (https://strymonas.github.io/), and in Sec. 8 it seems to claim advantages relative to the HERMIT work (though I've only skimmed this), so that might be relevant.
As further related work, rust does stream fusion and concatMap works fine. This solution basically has three parts:
functions have unique types
the state is not existential and the step function is in a type class based on the state
to make this usable impl trait is syntax sugar similar to partial type signatures so the state doesn't have to be explictely named everywhere
This still gets really awkward if you have an if statement in your computation because both branches need to return the same type. You can wrap the branches in an Either but GHC did not optimize this very well last time I tried this approach.
I think this is the key ingredient. If all functions had distinct types, we could just specialise for that type and rewrite rules would do the rest. In particular, we would be able to detect occurrences of the kinds of call patterns we want to specialise, which turns out to be non-trivial to integrate with GHC's current rewriting system, or would be very brittle at least.
But this still has all the same drawbacks wrt. code bloat. Rust doesn't seem to concerned about that, though...
Come to think of it, I wonder why iterator2here (which does exactly that: One type per stream combinator for which we can specialise the type class instance) performs so bad according to Michael Snoyman. I think if sum2 was properly specialised to the concrete "combinator stack", with SpecConstr and (type class) specialisation it should really yield optimal code. My guess is that it's a phase problem: SpecConstr runs very late in the pipeline and the specialiser won't run afterwards anymore. I'll try if running the specialiser once more fixes it.
My guess is that it's a phase problem: SpecConstr runs very late in the pipeline and the specialiser won't run afterwards anymore. I'll try if running the specialiser once more fixes it.
Well, no. The specialiser specialises based on type, and that works the same as after SpecConstr. And in fact, iterator2 is properly specialised, even the higher-order arguments. The following Haskell code
So everything is just perfect (well, we could inline the exit join point, but whatever). But that already worked with stream fusion, so nothing new here really. The big question is: Does it optimise away concatMap?
The first issue is that the type of the empty Iterator is different to that of the singleton Iterator, so you can't define filter in terms of concatMap, for example. Also the concrete iterator types seem to leak into every pipeline. So we would need an existential wrapper like this:
dataStreamawhereMkStream::(Iteratori,Itemi~a)=>!i->Stream-- Probably omit `Iterator` constraint, so we don't have to pay for it
and define all combinators in terms of this abstraction. Question is if it still specialises that well, but maybe it does if we keenly inline all combinators:
For the record, I think I got hung up about the loss of CPR in some critical cases involving existentials, leading to me opening #17548 with an ill-fated attempt at resolving the issue.
Does the ability to write a concatMap rule with higher order patterns in rules by @jaro#24725 (closed) mean that the previous roadblock to implementing list fusion with streams is now gone and that this issue can be reopened?