GHC: Ticket #3271: New methods for Data.Sequence
http://ghc.haskell.org/trac/ghc/ticket/3271
<p>
<tt>Data.Sequence</tt> is meant as a general-purpose implementation of finite sequences. The range of methods it offers, however, is considerably more constrained than <tt>Data.List</tt>, even allowing for the constraint that sequences are finite.
</p>
<p>
The following methods cannot be efficiently implemented based on currently exported methods from <tt>Data.Sequence</tt>.
</p>
<ul><li><tt>zipWith</tt> and its variants. Note: it suffices to define <tt>zipWith</tt> alone.
</li><li><tt>replicate</tt>. (This can be implemented in <tt>O(log n)</tt> time with node sharing.)
</li></ul><p>
The following methods are relatively simple extensions of already-exported methods. It may be more appropriate to allow library users to reimplement them themselves.
</p>
<ul><li><tt>scanl</tt>, <tt>scanr</tt>, and variants. This is relatively straightforward using methods borrowed from the <tt>Traversable</tt> instance.
</li><li><tt>tails</tt> and <tt>inits</tt>. <tt>tails</tt> is equivalent to <tt>scanr (<|) empty</tt>; <tt>inits</tt> is similar.
</li><li><tt>takeWhile</tt>, <tt>dropWhile</tt>, <tt>span</tt>, <tt>break</tt> (and perhaps from-the-end variations). Finding a breakpoint index can be done as efficiently on lists as on sequences; find the appropriate breakpoint index after converting to a list and use <tt>splitAt</tt>.
</li><li><tt>unfoldr</tt> and <tt>iterate</tt>, though the latter would require a finite number of iterations to perform.
</li><li><tt>partition</tt>. I can conceive of a slightly more efficient implementation than the trivial one, but the gains may be minimal.
</li></ul>en-usGHChttp://ghc.haskell.org/trac/ghc/chrome/site/ghc_logo.png
http://ghc.haskell.org/trac/ghc/ticket/3271
Trac 1.0.9LouisWassermanTue, 02 Jun 2009 21:38:51 GMTcomponent changed
http://ghc.haskell.org/trac/ghc/ticket/3271#comment:1
http://ghc.haskell.org/trac/ghc/ticket/3271#comment:1
<ul>
<li><strong>component</strong>
changed from <em>libraries/base</em> to <em>libraries (other)</em>
</li>
</ul>
TicketLouisWassermanTue, 02 Jun 2009 21:48:56 GMT
http://ghc.haskell.org/trac/ghc/ticket/3271#comment:2
http://ghc.haskell.org/trac/ghc/ticket/3271#comment:2
<p>
Discussion deadline: 2 weeks after ticket submission. (June 16)
</p>
TicketLouisWassermanWed, 03 Jun 2009 17:56:52 GMTattachment set
http://ghc.haskell.org/trac/ghc/ticket/3271
http://ghc.haskell.org/trac/ghc/ticket/3271
<ul>
<li><strong>attachment</strong>
set to <em>second-round-of-methods-for-data_sequence.dpatch</em>
</li>
</ul>
<p>
The remaining methods specified in the ticket.
</p>
TicketLouisWassermanWed, 03 Jun 2009 19:49:32 GMTattachment set
http://ghc.haskell.org/trac/ghc/ticket/3271
http://ghc.haskell.org/trac/ghc/ticket/3271
<ul>
<li><strong>attachment</strong>
set to <em>SequenceCheck.hs</em>
</li>
</ul>
<p>
Quick Check properties testing that each function specified in the ticket behaves analogously to the Data.List equivalents.
</p>
TicketLouisWassermanMon, 15 Jun 2009 20:19:19 GMT
http://ghc.haskell.org/trac/ghc/ticket/3271#comment:3
http://ghc.haskell.org/trac/ghc/ticket/3271#comment:3
<p>
New additional method: a sort method, possibly faster than the Data.List equivalent.
</p>
TicketLouisWassermanWed, 17 Jun 2009 17:45:56 GMTowner set
http://ghc.haskell.org/trac/ghc/ticket/3271#comment:4
http://ghc.haskell.org/trac/ghc/ticket/3271#comment:4
<ul>
<li><strong>owner</strong>
set to <em>LouisWasserman</em>
</li>
</ul>
TicketLouisWassermanWed, 24 Jun 2009 17:45:27 GMTattachment set
http://ghc.haskell.org/trac/ghc/ticket/3271
http://ghc.haskell.org/trac/ghc/ticket/3271
<ul>
<li><strong>attachment</strong>
set to <em>new-methods-for-data_sequence.dpatch</em>
</li>
</ul>
<p>
Final version of all methods for Data.Sequence, including sort and sortBy.
</p>
TicketiglooSun, 12 Jul 2009 17:21:04 GMTdifficulty, milestone set
http://ghc.haskell.org/trac/ghc/ticket/3271#comment:5
http://ghc.haskell.org/trac/ghc/ticket/3271#comment:5
<ul>
<li><strong>difficulty</strong>
set to <em>Unknown</em>
</li>
<li><strong>milestone</strong>
set to <em>Not GHC</em>
</li>
</ul>
TicketLouisWassermanSun, 12 Jul 2009 17:55:29 GMTattachment set
http://ghc.haskell.org/trac/ghc/ticket/3271
http://ghc.haskell.org/trac/ghc/ticket/3271
<ul>
<li><strong>attachment</strong>
set to <em>new-methods-for-data_sequence.2.dpatch</em>
</li>
</ul>
TicketLouisWassermanThu, 16 Jul 2009 23:54:08 GMT
http://ghc.haskell.org/trac/ghc/ticket/3271#comment:6
http://ghc.haskell.org/trac/ghc/ticket/3271#comment:6
<p>
In response to Ross's useful suggestions (which I had not actually noticed until recently), I have made considerable changes to my Data.Sequence patch here.
</p>
<p>
Here are the relevant points:
</p>
<ul><li>I completely rewrote the sorting method. The sort is unstable, but it is 40 lines (much more maintainable than the several-hundred line implementation from earlier) and runs *twice as fast as* (fromList . Data.List.sort . toList) for n=50000. (For sorted lists, it clocks in at about 4x faster.)
</li></ul><blockquote>
<blockquote>
<p>
o This is with no RTS options. With -H128m, the advantage is considerably thinner -- 39ms vs. 43ms (about one standard deviation) -- but the implementation is sufficiently compact that the moderate benefit is satisfactory.
</p>
</blockquote>
</blockquote>
<ul><li>tails and inits are considerably more sophisticated, running to about 50 lines apiece (although some of that code is shared between them), but
</li></ul><blockquote>
<blockquote>
<p>
o This implementation is genuinely asymptotically faster than the previous implementation: evaluating any tail from the sequence returned by tails takes O(log (min (i, n-i))), as opposed to O(n) in scanr (<|) empty xs, but evaluating every tail from the sequence takes O(n) overall, as opposed to O(n log n) in fromList [drop n xs | n <- [0..length xs]].
o Without direct access to the internals of the sequence it is impossible to match the asymptotic performance of this implementation.
o This implementation is also a hair faster in practice.
</p>
</blockquote>
</blockquote>
<ul><li>partition is now in fact implemented via a simple foldl', which is actually faster than my old, several-dozen-line implementation as well as obviously simpler.
</li><li>filter has been added to the list of methods available in Data.Sequence.
</li><li>iterate has been renamed to iterateN to reinforce the different type, which requires a finite bound on the sequence length.
</li><li>On the back end, replicate, iterateN, and sortBy do not use fromList, but instead use a common framework that wraps construction of a sequence in an Applicative functor. This automatically induces the node sharing that gives replicate performance O(log n), but behaves exactly correctly for all its other requirements.
</li><li>As a result, there is a faster alternative to fromList if the length of the list is known. The name and type of this method seems like it might become genuinely contentious, so I'm not inclined to expose that faster method at the moment.
</li></ul>
TicketLouisWassermanFri, 17 Jul 2009 00:05:30 GMTattachment set
http://ghc.haskell.org/trac/ghc/ticket/3271
http://ghc.haskell.org/trac/ghc/ticket/3271
<ul>
<li><strong>attachment</strong>
set to <em>new-methods-for-data_sequence.3.2.dpatch</em>
</li>
</ul>
<p>
Considerably better documented, benchmarked, optimized, and more concise implementation of the above methods.
</p>
TicketLouisWassermanFri, 17 Jul 2009 00:05:46 GMTattachment set
http://ghc.haskell.org/trac/ghc/ticket/3271
http://ghc.haskell.org/trac/ghc/ticket/3271
<ul>
<li><strong>attachment</strong>
set to <em>new-methods-for-data_sequence.3.dpatch</em>
</li>
</ul>
<p>
Considerably better documented, benchmarked, optimized, and more concise implementation of the above methods.
</p>
TicketLouisWassermanMon, 20 Jul 2009 18:33:38 GMTattachment set
http://ghc.haskell.org/trac/ghc/ticket/3271
http://ghc.haskell.org/trac/ghc/ticket/3271
<ul>
<li><strong>attachment</strong>
set to <em>new-methods-for-data_sequence.4.dpatch</em>
</li>
</ul>
<p>
This version contains the resolved-upon compromise, with a stable sort based on Data.List.sort for sort and sortBy and a speedy but unstable sort in unstableSort and unstableSortBy.
</p>
TicketLouisWassermanWed, 29 Jul 2009 17:06:16 GMTversion deleted
http://ghc.haskell.org/trac/ghc/ticket/3271#comment:7
http://ghc.haskell.org/trac/ghc/ticket/3271#comment:7
<ul>
<li><strong>version</strong>
<em>6.10.2</em> deleted
</li>
</ul>
TicketLouisWassermanWed, 29 Jul 2009 17:06:42 GMTattachment set
http://ghc.haskell.org/trac/ghc/ticket/3271
http://ghc.haskell.org/trac/ghc/ticket/3271
<ul>
<li><strong>attachment</strong>
set to <em>new-methods-for-data_sequence.5.dpatch</em>
</li>
</ul>
<p>
Contains a variety of new methods to deal with indices, and span-type methods starting from the end of the sequence.
</p>
TicketLouisWassermanFri, 21 Aug 2009 04:47:51 GMTattachment set
http://ghc.haskell.org/trac/ghc/ticket/3271
http://ghc.haskell.org/trac/ghc/ticket/3271
<ul>
<li><strong>attachment</strong>
set to <em>new-methods-for-data_sequence.6.dpatch</em>
</li>
</ul>
<p>
Reduced usage of partial pattern matches, and standardization of the XXXL/XXXR methods. New methods: foldWithIndexL, foldWithIndexR.
</p>
TicketrossTue, 15 Sep 2009 18:08:51 GMTstatus changed; resolution set
http://ghc.haskell.org/trac/ghc/ticket/3271#comment:8
http://ghc.haskell.org/trac/ghc/ticket/3271#comment:8
<ul>
<li><strong>status</strong>
changed from <em>new</em> to <em>closed</em>
</li>
<li><strong>resolution</strong>
set to <em>fixed</em>
</li>
</ul>
<p>
This was discussed at length by Louis and Ross, with no dissent from anyone else. During the discussion the patch went through several iterations, cutting a lot of specialized code, but also adding several more methods.
</p>
<p>
In committing the patch, I've tweaked the doc comments a little, and renamed foldWithIndexL and foldWithIndexR as foldlWithIndex and foldrWithIndex, parallelling foldlWithKey and foldrWithKey in Data.Map.
</p>
Ticket