GHC: Ticket #3909: Priority queues in containers
http://ghc.haskell.org/trac/ghc/ticket/3909
en-usGHChttp://ghc.haskell.org/trac/ghc/chrome/site/ghc_logo.png
http://ghc.haskell.org/trac/ghc/ticket/3909
Trac 1.0.1LouisWassermanThu, 04 Mar 2010 01:21:17 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:1
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:1
<p>
containers is such a popular library, it should include some well-designed implementations of another important data structure: the priority queue.
</p>
TicketLouisWassermanThu, 04 Mar 2010 01:21:27 GMTstatus changed
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:2
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:2
<ul>
<li><strong>status</strong>
changed from <em>new</em> to <em>assigned</em>
</li>
</ul>
TicketLouisWassermanThu, 04 Mar 2010 14:59:45 GMTattachment set
http://ghc.haskell.org/trac/ghc/ticket/3909
http://ghc.haskell.org/trac/ghc/ticket/3909
<ul>
<li><strong>attachment</strong>
set to <em>containers-pqueue.patch</em>
</li>
</ul>
<p>
Sexy implementation of a binomial queue, with the type system guaranteeing the correct relationship between binomial trees of different ranks.
</p>
TickettwhiteheadThu, 04 Mar 2010 15:48:00 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:3
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:3
<p>
There was a paper awhile back on functional implementation of priority search queues called "A Simple Implementation Technique for Priority Search Queues".
</p>
<p>
<a class="ext-link" href="http://portal.acm.org/citation.cfm?id=507650"><span class="icon"></span>http://portal.acm.org/citation.cfm?id=507650</a>
</p>
<p>
(the code in the paper is given in Haskell 98)
</p>
TicketLouisWassermanThu, 04 Mar 2010 16:22:05 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:4
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:4
<p>
That's a seriously interesting paper -- thanks for pointing it out, btw!
</p>
<p>
I'm inclined to think, though, that it's more important to have a good implementation of just the priority queue abstraction, especially one that performs reliably quickly, than necessarily also supporting searches, as in priority <em>search</em> queues. I like the idea of priority search queues, but I think they're more appropriate for a popular Hackage package than to go into containers, which focuses on high-performance, ubiquitous data structures.
</p>
TicketLouisWassermanThu, 04 Mar 2010 17:05:19 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:5
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:5
<p>
Created a new version that answers people's min/max-heap needs with the following approach:
</p>
<p>
There are now three modules: Data.PQueue.Min, Data.PQueue.Max, and Data.PQueue. Data.PQueue reexports Data.PQueue.Min with the type alias PQueue = MinQueue. (Observation: several other languages, including Java, default to a min-queue.) Data.PQueue.Max is implemented as a wrapper around Data.PQueue.Min. Everybody should be happy.
</p>
TicketrossThu, 04 Mar 2010 17:09:34 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:6
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:6
<p>
Replying to <a class="closed" href="http://ghc.haskell.org/trac/ghc/ticket/3909#comment:4" title="Comment 4 for Ticket #3909">LouisWasserman</a>:
</p>
<blockquote class="citation">
<p>
I'm inclined to think, though, that it's more important to have a good implementation of just the priority queue abstraction, especially one that performs reliably quickly, than necessarily also supporting searches, as in priority <em>search</em> queues. I like the idea of priority search queues, but I think they're more appropriate for a popular Hackage package than to go into containers, which focuses on high-performance, ubiquitous data structures.
</p>
</blockquote>
<p>
They are two different abstractions, which should have different implementations.
</p>
TicketLouisWassermanThu, 04 Mar 2010 17:10:51 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:7
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:7
<blockquote class="citation">
<p>
They are two different abstractions, which should have different implementations.
</p>
</blockquote>
<p>
I definitely agree, but my point was that I'm not sure priority search queues are well-used enough -- I hadn't even heard the term before reading the article -- that they should go into containers.
</p>
TicketLouisWassermanThu, 04 Mar 2010 17:22:48 GMTattachment set
http://ghc.haskell.org/trac/ghc/ticket/3909
http://ghc.haskell.org/trac/ghc/ticket/3909
<ul>
<li><strong>attachment</strong>
set to <em>containers-pqueue.2.patch</em>
</li>
</ul>
<p>
New version with both min and max queues exported in a nice, minimalist way that doesn't get in people's way, like many existing implementations.
</p>
TickettwhiteheadThu, 04 Mar 2010 18:18:50 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:8
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:8
<p>
Replying to <a class="closed" href="http://ghc.haskell.org/trac/ghc/ticket/3909#comment:7" title="Comment 7 for Ticket #3909">LouisWasserman</a>:
</p>
<blockquote class="citation">
<blockquote class="citation">
<p>
They are two different abstractions, which should have different implementations.
</p>
</blockquote>
<p>
I definitely agree, but my point was that I'm not sure priority search queues are well-used enough -- I hadn't even heard the term before reading the article -- that they should go into containers.
</p>
</blockquote>
<p>
I'm happy with whatever. I just threw up the paper because I knew about it, and it seemed like it could possibly contain some information of relevance.
</p>
TicketLouisWassermanThu, 04 Mar 2010 19:44:11 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:9
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:9
<p>
New functions added: (!!), take, drop, splitAt, takeWhile, dropWhile, span, break.
</p>
TicketLouisWassermanThu, 04 Mar 2010 20:00:47 GMTattachment set
http://ghc.haskell.org/trac/ghc/ticket/3909
http://ghc.haskell.org/trac/ghc/ticket/3909
<ul>
<li><strong>attachment</strong>
set to <em>containers-pqueue.3.patch</em>
</li>
</ul>
<p>
New functions added: (!!), take, drop, splitAt, takeWhile, dropWhile, span, break.
</p>
TicketLouisWassermanThu, 04 Mar 2010 21:51:23 GMTattachment set
http://ghc.haskell.org/trac/ghc/ticket/3909
http://ghc.haskell.org/trac/ghc/ticket/3909
<ul>
<li><strong>attachment</strong>
set to <em>pairing-heap.patch</em>
</li>
</ul>
<p>
Pairing heap implementation for comparative benchmarking and to be considered as an alternative. Implemented with a similar interface to Data.PQueue.Min binomial heap.
</p>
TicketLouisWassermanThu, 04 Mar 2010 21:53:49 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:10
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:10
<p>
I've written and uploaded a pairing heap implementation for consideration as an alternative.
I am moderately concerned that even implemented strictly, there isn't a good way for users to avoid future extract operations taking time linear in the size of the heap. However, I think this is in most respects significantly faster overall. (I used something similar for Data.Sequence.unstableSort.)
</p>
TicketLouisWassermanThu, 04 Mar 2010 22:42:24 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:11
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:11
<p>
A brief benchmark approach:
</p>
<p>
Heapsorting 25000 integers, with +RTS -H128m:
</p>
<blockquote>
<p>
min mean +/-sd median max
</p>
</blockquote>
<p>
Pairing: 32.002 34.042 2.756 32.002 52.004
Binomial/l: 64.004 67.324 2.348 68.004 76.005
Binomial/s: 72.004 76.205 3.134 76.005 88.005
</p>
<p>
/l and /s are lazy and strict spines, respectively. I should implement a less-typesafe, more-compact version of binomial heaps for comparison, as well.
</p>
<p>
How amenable would people be to exporting several different versions of a priority queue from containers? If we only stuck to one, which one should it be?
</p>
<p>
Binomial queues offer guaranteed O(log n) worst-case performance on any of their operations, but pairing heaps offer almost everything in guaranteed O(1) time in exchange for typically much faster, amortized O(log n), albeit worst-case O(n), delete-mins. Which one of these is more critical to most users, and which of these might be preferable to just put in a package and recommend to users with particular needs?
</p>
TicketLouisWassermanThu, 04 Mar 2010 23:24:25 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:12
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:12
<p>
Update: a less-typesafe binomial queue, which handles sparse binomial forests better, does not do significantly better (a millisecond average difference) than the nicely typesafe version. Uploading code for grins, but I don't think there are any significant optimizations left to make to this data structure.
</p>
TicketLouisWassermanThu, 04 Mar 2010 23:39:08 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:13
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:13
<p>
Replying to <a class="closed" href="http://ghc.haskell.org/trac/ghc/ticket/3909#comment:12" title="Comment 12 for Ticket #3909">LouisWasserman</a>:
Note: a more tail-recursive extract-queue implementation doesn't help, either.
</p>
TicketsimonmarFri, 05 Mar 2010 16:59:37 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:14
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:14
<p>
Also look at this one, an implementation of Hinze's algorithm by Johan Tibell, as part of the work he and Bryan O'Sullivan are doing on the new GHC IO manager:
</p>
<p>
<a class="ext-link" href="http://github.com/bos/event/blob/master/src/System/Event/PSQ.hs"><span class="icon"></span>http://github.com/bos/event/blob/master/src/System/Event/PSQ.hs</a>
</p>
TicketLouisWassermanFri, 05 Mar 2010 18:06:50 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:15
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:15
<p>
Interesting.
</p>
<p>
Again, I don't think most people need the full power of priority search queues, and I'm inclined to suggest that we just create a package for those, and then maybe link to it from the containers documentation.
</p>
<p>
(Also, after painstakingly editing the implementation to support general ordered values, it was slower than the binomial queue implementation.)
</p>
<p>
What do people think of the following?
</p>
<p>
Put pairing heaps (by far the fastest implementation so far, nearly 2.3x faster than binomial heaps) into containers. Launch packages containing high-quality priority search queues and real-time-ish binomial queues. (The binomial queue implementation guarantees that deleteMin has worst-case time O(log n), while pairing heaps have worst-case O(n). For people with real-time speed needs, this is relevant.) Endorse these packages in the documentation in Data.PQueue, for people with these specialized needs.
</p>
TicketLouisWassermanSun, 07 Mar 2010 22:04:14 GMTattachment set
http://ghc.haskell.org/trac/ghc/ticket/3909
http://ghc.haskell.org/trac/ghc/ticket/3909
<ul>
<li><strong>attachment</strong>
set to <em>containers-pqueue.4.patch</em>
</li>
</ul>
<p>
Semi-final implementation of binomial heap priority queue. Provides linear-time filter and partition implementations.
</p>
TicketmilanMon, 08 Mar 2010 19:01:38 GMTattachment set
http://ghc.haskell.org/trac/ghc/ticket/3909
http://ghc.haskell.org/trac/ghc/ticket/3909
<ul>
<li><strong>attachment</strong>
set to <em>plot.png</em>
</li>
</ul>
<p>
Benchmark of pairing heaps, lazy pairing heaps and binomial pairing heaps (v4).
</p>
TicketmilanMon, 08 Mar 2010 19:11:35 GMTcc set
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:16
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:16
<ul>
<li><strong>cc</strong>
<em>fox@…</em> added
</li>
</ul>
<p>
Hi,
</p>
<p>
I created a small benchmark benchmarking a performance of heap sort using a
</p>
<ul><li>pairing heap from pairing-heap.patch ("pq")
</li><li>binomial heap from containers-pqueue.4.patch ("bq")
</li><li>mine lazy pairing heap (in the benchmark) ("lpq")
</li></ul><p>
<a style="padding:0; border:none" href="http://ghc.haskell.org/trac/ghc/attachment/ticket/3909/plot.png"><img src="http://ghc.haskell.org/trac/ghc/raw-attachment/ticket/3909/plot.png" alt="Benchmark of pairing heaps, lazy pairing heaps and binomial pairing heaps (v4)." title="Benchmark of pairing heaps, lazy pairing heaps and binomial pairing heaps (v4)." /></a>
</p>
<p>
The benchmark is drawn from "pq" point of view. The number in the test label is the logarithm of the size of the list to sort, asc/dsc means the input list was ordered increasingly/decreasingly, rnd means the input list was random. In every cases <tt>toAscList . fromList</tt> was used. The sources of the benchmark are uploaded too, just unpact <a class="attachment" href="http://ghc.haskell.org/trac/ghc/attachment/ticket/3909/benchmark.tgz" title="Attachment 'benchmark.tgz' in Ticket #3909">benchmark.tgz</a><a class="trac-rawlink" href="http://ghc.haskell.org/trac/ghc/raw-attachment/ticket/3909/benchmark.tgz" title="Download"></a> and do <tt>make plot</tt> with <tt>progression</tt> installed. I used GHC 6.12.1.
</p>
<p>
What is suprising is that lazy pairing heaps are faster than pairing heaps on random input data, ie. in the "interesting" case.
</p>
TicketmilanMon, 08 Mar 2010 19:21:18 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:17
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:17
<p>
Replying to <a class="closed" href="http://ghc.haskell.org/trac/ghc/ticket/3909#comment:15" title="Comment 15 for Ticket #3909">LouisWasserman</a>:
</p>
<blockquote class="citation">
<p>
What do people think of the following?
</p>
<p>
Put pairing heaps (by far the fastest implementation so far, nearly 2.3x faster than binomial heaps) into containers. Launch packages containing high-quality priority search queues and real-time-ish binomial queues. (The binomial queue implementation guarantees that deleteMin has worst-case time O(log n), while pairing heaps have worst-case O(n). For people with real-time speed needs, this is relevant.) Endorse these packages in the documentation in Data.PQueue, for people with these specialized needs.
</p>
</blockquote>
<p>
I think PQueue in containers should work in a persistent setting, which pairing heaps do not. That is why I put lazy pairing heaps in the benchmark. This variant is from Okasaki's book Purely Functional Data Structures and is _conjectured_ (and believed, at least by Okasaki and me:) to have same amortized bounds in persistent setting.
</p>
<p>
Personally I would put lazy pairing heaps to the containers (beware, the current implementation has to be yet worked if we agree). If not, I would use binomial heaps or maybe benchmark leftist heaps.
</p>
TicketmilanMon, 08 Mar 2010 19:21:47 GMTattachment set
http://ghc.haskell.org/trac/ghc/ticket/3909
http://ghc.haskell.org/trac/ghc/ticket/3909
<ul>
<li><strong>attachment</strong>
set to <em>benchmark.tgz</em>
</li>
</ul>
<p>
Benchmark of pairing heaps, lazy pairing heaps and binomial heaps (v4).
</p>
TicketLouisWassermanMon, 08 Mar 2010 19:30:01 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:18
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:18
<p>
Interesting.
</p>
<p>
I'm starting to feel, though, that I can't justify putting pairing heaps in containers, because of the possibility that even under normal usage that O(n<sup>2) work could be done when O(n log n) was expected. The worst case for delete-min of a pairing heap is O(n), and if that operation is performed several times in that worst case, it'd get ugly. It makes me sad, because like your benchmark shows, pairing heaps are so effective in the single-threaded case. For the same reason, I'd prefer a strict data structure, because we have to have an implementation that's suitable for *everybody,* including the people with serious real-time speed requirements.
</sup></p>
TicketmilanMon, 08 Mar 2010 20:13:59 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:19
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:19
<p>
Replying to <a class="closed" href="http://ghc.haskell.org/trac/ghc/ticket/3909#comment:18" title="Comment 18 for Ticket #3909">LouisWasserman</a>:
</p>
<blockquote class="citation">
<p>
Interesting.
</p>
<p>
I'm starting to feel, though, that I can't justify putting pairing heaps in containers, because of the possibility that even under normal usage that O(n<sup>2) work could be done when O(n log n) was expected. The worst case for delete-min of a pairing heap is O(n), and if that operation is performed several times in that worst case, it'd get ugly. It makes me sad, because like your benchmark shows, pairing heaps are so effective in the single-threaded case. For the same reason, I'd prefer a strict data structure, because we have to have an implementation that's suitable for *everybody,* including the people with serious real-time speed requirements.
</sup></p>
</blockquote>
<p>
Yes, I think that the O(N) delete-min in persistent setting is a show-stopper.
</p>
<p>
If amortized bounds are enouth, maybe lazy pairing heaps? They seem to even beat pairing heaps in the "input is not uniform" case.
</p>
<p>
If we want real-time bounds, binomial heaps are probably fine. Do you have any idea how a leftist heap would do? We can perhaps add it to the benchmark...
</p>
TicketLouisWassermanTue, 09 Mar 2010 00:00:34 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:20
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:20
<p>
Even lazy pairing heaps have the O(n) worst-case delete-min performance. Amortization isn't going to cut it.
</p>
<p>
Wikipedia suggested that skew heaps usually outperformed leftist heaps. I wrote a reasonably refined implementation of skew heaps and got
</p>
<p>
Times (ms)
</p>
<blockquote>
<p>
min mean +/-sd median max
</p>
</blockquote>
<p>
Pairing: 8.000 12.401 4.274 12.001 28.002
Binomial: 24.001 31.842 6.200 28.002 44.003
Skew: 24.001 31.922 6.280 32.002 48.003
</p>
<p>
for sorting of random data. Also, more importantly, skew heaps don't even support amortized O(1) insertion, let alone worst-case. Binomial heaps support amortized O(1) insertion, worst-case O(log n). There's a variant of binomial heaps which supports worst-case O(log n) insertion, but it's not very pretty to code, and it would make an already large amount of code blow up hugely. I'm willing to settle for the amortized bounds there.
</p>
<p>
Conclusion: I'm inclined to stick with binomial heaps. Further experimentation has indicated that there isn't much difference between lazy and strict sorting speeds, so I'm inclined to stick with the strict implementation -- again for the crowd who needs real-time performance.
</p>
TicketmilanTue, 09 Mar 2010 09:06:22 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:21
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:21
<p>
Replying to <a class="closed" href="http://ghc.haskell.org/trac/ghc/ticket/3909#comment:20" title="Comment 20 for Ticket #3909">LouisWasserman</a>:
</p>
<blockquote class="citation">
<p>
Even lazy pairing heaps have the O(n) worst-case delete-min performance. Amortization isn't going to cut it.
</p>
</blockquote>
<p>
Have you looked at the implementation? Let me aside, even Okasaki conjectures that I inserts M merges and D delete-mins takes O(I+D log N) in the persistent setting ("proof by authority":)
</p>
<p>
The trik is the following -- lazy pairing heaps use back-to-front variant (both pairings and final merge is performed from back to front). When there are three children of the root, you merge the newest two and merges the result to the oldest child, immediately, but lazily.
</p>
<p>
So if you do a sequence of inserts followed by a delete-min, all previous versions of the heap are affected -- performing a delete-min now in any version (except the latest) will take O(1).
</p>
<p>
As I said, there is not a solid proof, but neither could anyone find a sequence of operations in persistent setting that would break claimed bound.
</p>
<blockquote class="citation">
<p>
Wikipedia suggested that skew heaps usually outperformed leftist heaps. I wrote a reasonably refined implementation of skew heaps and got
</p>
<p>
Times (ms)
</p>
<blockquote>
<p>
min mean +/-sd median max
</p>
</blockquote>
<p>
Pairing: 8.000 12.401 4.274 12.001 28.002
Binomial: 24.001 31.842 6.200 28.002 44.003
Skew: 24.001 31.922 6.280 32.002 48.003
</p>
<p>
for sorting of random data. Also, more importantly, skew heaps don't even support amortized O(1) insertion, let alone worst-case. Binomial heaps support amortized O(1) insertion, worst-case O(log n). There's a variant of binomial heaps which supports worst-case O(log n) insertion, but it's not very pretty to code, and it would make an already large amount of code blow up hugely. I'm willing to settle for the amortized bounds there.
</p>
<p>
Conclusion: I'm inclined to stick with binomial heaps. Further experimentation has indicated that there isn't much difference between lazy and strict sorting speeds, so I'm inclined to stick with the strict implementation -- again for the crowd who needs real-time performance.
</p>
</blockquote>
<p>
I would probably stick with lazy pairing heaps, and if the time bounds would discovered to be wrong, to switch it with binomial implementation.
</p>
<p>
The only issue of lazy pairing heaps is amortization (let the conjecture aside). I personally think the performance win justifies this.
</p>
<p>
Of course, providing a real-time impementation, preferably with the same API, is a great idea. Maybe in another package?
</p>
<p>
And a last think -- do you think <tt>ViewQ</tt> is a good idea? I would prefer <tt>extract</tt> to return <tt>Maybe (a, PQueue a)</tt>. I agree it is not so elegant in pattern matching, but
</p>
<ul><li>if the module is qualified, you have to write <tt>PQueue.(:^) a queue</tt> with the new-style qualified operator, which is horrible
</li><li>you cannot use monadic combinators. If the extract returns in the <tt>Maybe</tt> monad, you can.
</li></ul>
TicketLouisWassermanTue, 09 Mar 2010 20:36:34 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:22
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:22
<p>
Aha! This wasn't how I'd imagined that lazy pairing heaps worked, but I love it. Yeah, that's beautiful, let me change my own implementation... <strong>works</strong>
</p>
<p>
ViewQ is...tricky. I included it by analogy to Data.Sequence, which has the same issue. I'm...not entirely sure what the best policy is.
</p>
TicketmilanTue, 09 Mar 2010 20:57:45 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:23
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:23
<p>
Replying to <a class="closed" href="http://ghc.haskell.org/trac/ghc/ticket/3909#comment:22" title="Comment 22 for Ticket #3909">LouisWasserman</a>:
</p>
<blockquote class="citation">
<p>
Aha! This wasn't how I'd imagined that lazy pairing heaps worked, but I love it. Yeah, that's beautiful, let me change my own implementation... <strong>works</strong>
</p>
</blockquote>
<p>
Sorry, I mentioned it only in comment 17, not in the benchmark description :(
</p>
<blockquote class="citation">
<p>
ViewQ is...tricky. I included it by analogy to Data.Sequence, which has the same issue. I'm...not entirely sure what the best policy is.
</p>
</blockquote>
<p>
I understand the idea, but personally I am strongly for <tt>Maybe (a, PQueue a)</tt>, if there is a poll :)
</p>
TicketLouisWassermanTue, 09 Mar 2010 21:48:29 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:24
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:24
<p>
Hmmmkay.
</p>
<p>
I'm going to make my own vote for <tt>Maybe (a, PQueue a)</tt>, and change it, since we appear to be in charge...
</p>
<p>
After doing my own experiments with lazy pairing heaps, I'm still not sure I'm willing to support lazy pairing heaps, again out of concern for the real-time-performance crowd. The distinction is really the following:
</p>
<p>
Binomial queues support amortized O(1) insertion, but worst-case O(log n).
Pairing heaps support amortized O(log n) delete-min, but worst-case O(n). Okasaki doesn't debate worst-case performance.
</p>
<p>
I'm convinced that this approach doesn't risk O(n<sup>2) performance in a persistent context, now, which is a significant improvement, but I'm not fully convinced it's enough yet.
</sup></p>
<p>
Uploading my updated versions now.
</p>
TicketLouisWassermanTue, 09 Mar 2010 21:57:24 GMTattachment set
http://ghc.haskell.org/trac/ghc/ticket/3909
http://ghc.haskell.org/trac/ghc/ticket/3909
<ul>
<li><strong>attachment</strong>
set to <em>containers-pqueue.5.patch</em>
</li>
</ul>
<p>
Contains both lazy pairing heap and binomial queue in separate patches. Gets rid of the ViewQ nonsense.
</p>
TicketLouisWassermanTue, 09 Mar 2010 22:05:45 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:25
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:25
<p>
Another minor concern: I'm not sure lazy pairing heaps can support linear-time unordered traversals, or anything of the sort. I still need to think about that, because it's not impossible that the same operations actually do take linear time still...
</p>
TicketmilanTue, 09 Mar 2010 22:27:09 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:26
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:26
<p>
Replying to <a class="closed" href="http://ghc.haskell.org/trac/ghc/ticket/3909#comment:24" title="Comment 24 for Ticket #3909">LouisWasserman</a>:
</p>
<blockquote class="citation">
<p>
Hmmmkay.
</p>
<p>
I'm going to make my own vote for <tt>Maybe (a, PQueue a)</tt>, and change it, since we appear to be in charge...
</p>
</blockquote>
<p>
Great :)
</p>
<blockquote class="citation">
<p>
After doing my own experiments with lazy pairing heaps, I'm still not sure I'm willing to support lazy pairing heaps, again out of concern for the real-time-performance crowd. The distinction is really the following:
</p>
<p>
Binomial queues support amortized O(1) insertion, but worst-case O(log n).
Pairing heaps support amortized O(log n) delete-min, but worst-case O(n). Okasaki doesn't debate worst-case performance.
</p>
</blockquote>
<p>
The lazy pairing heaps are really only amortized -- you know (if the conjecture is right) that I inserts, M merges and D delete-mins take O(I + D log N), but a delete-min can take up to O(N) time (which cannot happen frequently, of course).
</p>
<p>
If we want structure with worst-case bounds, it is not (any kind of) pairing heaps.
</p>
<p>
For me, a quicker structure with amortized bounds is better in Haskell. If you for example perform a heap-sort, than the amortized bounds works well and you have O(N log N) guaranteed complexity. Only thing that doesn't work is "now immediately perform this delete-min in O(log N)". But this kind of operations are not common in Haskell -- you would have to carefully evaluate every operation, because trivially performing 'insert' would probably create only a thunk which the first delete-min would have to evaluate and thus could take O(N).
</p>
<p>
But, as you said, binomial heaps are worst-case. They have provable bounds. And are a common knowledge -- which this variant of lazy pairing heap is not.
</p>
<p>
How is it with stability -- don't you know?
</p>
<p>
Milan
</p>
TicketmilanTue, 09 Mar 2010 22:31:50 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:27
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:27
<p>
Replying to <a class="closed" href="http://ghc.haskell.org/trac/ghc/ticket/3909#comment:25" title="Comment 25 for Ticket #3909">LouisWasserman</a>:
</p>
<blockquote class="citation">
<p>
Another minor concern: I'm not sure lazy pairing heaps can support linear-time unordered traversals, or anything of the sort. I still need to think about that, because it's not impossible that the same operations actually do take linear time still...
</p>
</blockquote>
<p>
Well, that is a good point.
</p>
<p>
Long story short, with this representation it will take O(N log N) in some cases. Maybe the representation could be changed, but not straightforwardly.
</p>
<p>
So -- the binomial heaps?
</p>
TicketLouisWassermanTue, 09 Mar 2010 22:41:14 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:28
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:28
<p>
Yeah. When I've needed real-time performance, I'd probably do something like
</p>
<pre class="wiki">insert x $! queue
</pre><p>
which defers the lazy work by one operation, but only by one operation. Essentially, I'd like to be able to guarantee that <tt>seq</tt> actually guarantees a fully evaluated spine, cf. Data.Map or Data.IntMap.
</p>
<p>
I'm willing, though, to perhaps co-author a package for pairing heaps, and then to recommend that package in the containers documentation as a speedy implementation for people who are willing to settle for amortized time bounds in exchange for overall speedy performance.
</p>
TicketmilanTue, 09 Mar 2010 23:28:11 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:29
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:29
<p>
Replying to <a class="closed" href="http://ghc.haskell.org/trac/ghc/ticket/3909#comment:28" title="Comment 28 for Ticket #3909">LouisWasserman</a>:
</p>
<blockquote class="citation">
<p>
Yeah. When I've needed real-time performance, I'd probably do something like
</p>
<pre class="wiki">insert x $! queue
</pre><p>
which defers the lazy work by one operation, but only by one operation. Essentially, I'd like to be able to guarantee that <tt>seq</tt> actually guarantees a fully evaluated spine, cf. Data.Map or Data.IntMap.
</p>
<p>
I'm willing, though, to perhaps co-author a package for pairing heaps, and then to recommend that package in the containers documentation as a speedy implementation for people who are willing to settle for amortized time bounds in exchange for overall speedy performance.
</p>
</blockquote>
<p>
Tomorrow I am going to implement leftist-heaps, just for comparison with binomial heaps.
</p>
<p>
If you don't mind, I would wait a bit with the pairing heaps, I want to try to prove the complexity of the lazy variant. Maybe something will come out of it (perhaps a different representation that would address the "traverse in linear time" issue).
</p>
TicketLouisWassermanTue, 09 Mar 2010 23:46:01 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:30
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:30
<p>
Sounds good. I'm uploading a quickie implementation of skew heaps, which Wikipedia says should beat leftist heaps, but which is beaten by my binomial heaps.
</p>
<p>
I'd be interested to help with the proof that lazy pairing heaps are amortized O(log n), and I certainly believe the claim, but I'm still inclined to use binomial heaps in containers, and then the two of us should co-release a lazy pairing heap implementation that we recommend in the containers documentation.
</p>
<p>
Look forward to seeing your results.
</p>
TicketLouisWassermanTue, 09 Mar 2010 23:47:19 GMTattachment set
http://ghc.haskell.org/trac/ghc/ticket/3909
http://ghc.haskell.org/trac/ghc/ticket/3909
<ul>
<li><strong>attachment</strong>
set to <em>Skew.hs</em>
</li>
</ul>
<p>
Quickie implementation of skew heaps, a variation on leftist heaps that Wikipedia suggests is almost always better. This tests as inferior (in almost every way) to binomial heaps, and also offers worse runtimes (amortized O(log n) insert, for instance), so I don't particularly think it'll supplant binomial or pairing heaps
</p>
TickettwhiteheadWed, 10 Mar 2010 15:21:32 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:31
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:31
<p>
Replying to <a class="closed" href="http://ghc.haskell.org/trac/ghc/ticket/3909#comment:28" title="Comment 28 for Ticket #3909">LouisWasserman</a>:
</p>
<blockquote class="citation">
<p>
Yeah. When I've needed real-time performance, I'd probably do something like
</p>
<pre class="wiki">insert x $! queue
</pre><p>
which defers the lazy work by one operation, but only by one operation. Essentially, I'd like to be able to guarantee that <tt>seq</tt> actually guarantees a fully evaluated spine, cf. Data.Map or Data.IntMap.
</p>
</blockquote>
<p>
I believe this is what Simon created Control.DeepSeq for a couple of months ago
</p>
<p>
<a class="ext-link" href="http://hackage.haskell.org/package/deepseq"><span class="icon"></span>http://hackage.haskell.org/package/deepseq</a>
</p>
TicketmilanWed, 10 Mar 2010 15:30:22 GMTattachment set
http://ghc.haskell.org/trac/ghc/ticket/3909
http://ghc.haskell.org/trac/ghc/ticket/3909
<ul>
<li><strong>attachment</strong>
set to <em>plot_2.png</em>
</li>
</ul>
<p>
Results of benchmark_2.tgz.
</p>
TicketmilanWed, 10 Mar 2010 15:31:11 GMTattachment set
http://ghc.haskell.org/trac/ghc/ticket/3909
http://ghc.haskell.org/trac/ghc/ticket/3909
<ul>
<li><strong>attachment</strong>
set to <em>benchmark_2.tgz</em>
</li>
</ul>
<p>
Benchmark of v5 patch, Skew heaps and Milan's Leftist and Pairing heaps
</p>
TicketmilanWed, 10 Mar 2010 15:54:59 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:32
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:32
<p>
Hi,
</p>
<p>
another benchmark <a class="attachment" href="http://ghc.haskell.org/trac/ghc/attachment/ticket/3909/benchmark_2.tgz" title="Attachment 'benchmark_2.tgz' in Ticket #3909">benchmark_2.tgz</a><a class="trac-rawlink" href="http://ghc.haskell.org/trac/ghc/raw-attachment/ticket/3909/benchmark_2.tgz" title="Download"></a>. This time I benchmarked
</p>
<ul><li>Binomial -- <tt>Data.PQueue.Min</tt> from (v5) patch
</li><li>Pairing -- <tt>Data.PQueue.Pairing</tt> from (v5) patch
</li><li>Skew -- <a class="attachment" href="http://ghc.haskell.org/trac/ghc/attachment/ticket/3909/Skew.hs" title="Attachment 'Skew.hs' in Ticket #3909">Skew.hs</a><a class="trac-rawlink" href="http://ghc.haskell.org/trac/ghc/raw-attachment/ticket/3909/Skew.hs" title="Download"></a>
</li><li>PairingByMilan -- in the <a class="attachment" href="http://ghc.haskell.org/trac/ghc/attachment/ticket/3909/benchmark_2.tgz" title="Attachment 'benchmark_2.tgz' in Ticket #3909">benchmark_2.tgz</a><a class="trac-rawlink" href="http://ghc.haskell.org/trac/ghc/raw-attachment/ticket/3909/benchmark_2.tgz" title="Download"></a>
</li><li>LeftistByMilan -- in the <a class="attachment" href="http://ghc.haskell.org/trac/ghc/attachment/ticket/3909/benchmark_2.tgz" title="Attachment 'benchmark_2.tgz' in Ticket #3909">benchmark_2.tgz</a><a class="trac-rawlink" href="http://ghc.haskell.org/trac/ghc/raw-attachment/ticket/3909/benchmark_2.tgz" title="Download"></a>
</li></ul><p>
This time I measured only the heap-sort of random list with various lengths -- the number in the test name is the logarithm of the list size.
</p>
<p>
<a style="padding:0; border:none" href="http://ghc.haskell.org/trac/ghc/attachment/ticket/3909/plot_2.png"><img src="http://ghc.haskell.org/trac/ghc/raw-attachment/ticket/3909/plot_2.png" alt="Results of benchmark_2.tgz." title="Results of benchmark_2.tgz." /></a>
</p>
<ul><li>the Pairing implementation is still too strict -- I believe there are N inserts and N deletemins in the persistent settings, such that Pairing will need O(N<sup>2</sup>) to answer them
</li><li>my LazyPairing in the previous benchmark was buggy. This time it works, but is worse than Binomial
</li><li>LeftistByMilan is actually pretty good for small inputs (up to ~ 2<sup>16</sup>), and the implementation is super-trivial compared to Binomial. But when the list gets bigger, the complexity gets worse. I think this is beause of O(log N) inserts -- the Binomial implementation with O(1) amortized inserts allocates much less memory and does not need to run a GC.
</li></ul><p>
I think the pairing heaps are out of the question now. I would choose between Binomial and Leftist. The Binomial have O(1) amortized inserts, but beware, that this does not work in persistent setting -- the insert is O(log N) when the heaps are used persistently (there are Skew Binomial Heaps with O(1) insert in the persistent setting too).
</p>
<p>
I vote for current (v5) Binomial implementation, even if the O(1) amortized inserts works only non-persistently (ie. under heavy persistent usage, leftist heaps might be a _little_ better).
</p>
TicketLouisWassermanWed, 10 Mar 2010 16:07:07 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:33
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:33
<p>
Agreed. I've got a few (mostly aesthetic) updates to the binomial implementation, by the way -- uploading those now. Mostly, they just make the type details prettier, and they also provide fromDescList, toDescList, foldrDesc, etcetera. (These are strict, and are essentially just implemented as dual to the ascending-order implementation, but there's no reason not to have them.)
</p>
TicketLouisWassermanWed, 10 Mar 2010 16:12:25 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:34
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:34
<p>
Absolutely the case, but we can't use deepseq as a prerequisite for containers, can we? =(
</p>
<p>
Replying to <a class="closed" href="http://ghc.haskell.org/trac/ghc/ticket/3909#comment:31" title="Comment 31 for Ticket #3909">twhitehead</a>:
</p>
<blockquote class="citation">
<p>
Replying to <a class="closed" href="http://ghc.haskell.org/trac/ghc/ticket/3909#comment:28" title="Comment 28 for Ticket #3909">LouisWasserman</a>:
</p>
<blockquote class="citation">
<p>
Yeah. When I've needed real-time performance, I'd probably do something like
</p>
<pre class="wiki">insert x $! queue
</pre><p>
which defers the lazy work by one operation, but only by one operation. Essentially, I'd like to be able to guarantee that <tt>seq</tt> actually guarantees a fully evaluated spine, cf. Data.Map or Data.IntMap.
</p>
</blockquote>
<p>
I believe this is what Simon created Control.DeepSeq for a couple of months ago
</p>
<p>
<a class="ext-link" href="http://hackage.haskell.org/package/deepseq"><span class="icon"></span>http://hackage.haskell.org/package/deepseq</a>
</p>
</blockquote>
TicketLouisWassermanWed, 10 Mar 2010 16:15:13 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:35
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:35
<p>
In retrospect, I'm not sure what the procedure for patching deepseq would be -- which would be an appropriate approach, except that there's no good way to expose the functionality to be strict on *just* the roots in the binomial forest. Fortunately, my tests seem to indicate that keeping the spine of the binomial heap strict doesn't affect speed too adversely, so we'll keep it strict for everybody.
</p>
TicketmilanWed, 10 Mar 2010 16:19:13 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:36
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:36
<p>
Replying to <a class="closed" href="http://ghc.haskell.org/trac/ghc/ticket/3909#comment:35" title="Comment 35 for Ticket #3909">LouisWasserman</a>:
</p>
<blockquote class="citation">
<p>
In retrospect, I'm not sure what the procedure for patching deepseq would be -- which would be an appropriate approach, except that there's no good way to expose the functionality to be strict on *just* the roots in the binomial forest. Fortunately, my tests seem to indicate that keeping the spine of the binomial heap strict doesn't affect speed too adversely, so we'll keep it strict for everybody.
</p>
</blockquote>
<p>
I totally agree. I do not think deepseq would do us any good -- we keep the structure quite strict ourselves anyway. And yes, a strict spine is great, performance++ :)
</p>
TicketLouisWassermanSat, 13 Mar 2010 22:49:46 GMTattachment set
http://ghc.haskell.org/trac/ghc/ticket/3909
http://ghc.haskell.org/trac/ghc/ticket/3909
<ul>
<li><strong>attachment</strong>
set to <em>containers-pqueue.6.patch</em>
</li>
</ul>
<p>
Aesthetic code improvements to the binomial heap implementation, along with descending-order operations.
</p>
TicketLouisWassermanTue, 16 Mar 2010 13:42:29 GMTtype changed
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:37
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:37
<ul>
<li><strong>type</strong>
changed from <em>feature request</em> to <em>proposal</em>
</li>
</ul>
TicketLouisWassermanTue, 16 Mar 2010 13:42:48 GMT
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:38
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:38
<p>
Deadline: Friday, Mar 26.
</p>
TicketLouisWassermanTue, 16 Mar 2010 18:06:39 GMTattachment set
http://ghc.haskell.org/trac/ghc/ticket/3909
http://ghc.haskell.org/trac/ghc/ticket/3909
<ul>
<li><strong>attachment</strong>
set to <em>containers-pqueue.7.patch</em>
</li>
</ul>
<p>
Functor and Traversable instances have been removed in favor of only offering mapMonotonic and traverseMonotonic. The Foldable instance has been kept.
</p>
TicketLouisWassermanTue, 16 Mar 2010 23:53:34 GMTattachment set
http://ghc.haskell.org/trac/ghc/ticket/3909
http://ghc.haskell.org/trac/ghc/ticket/3909
<ul>
<li><strong>attachment</strong>
set to <em>QuickBinom.hs</em>
</li>
</ul>
<p>
A quickie implementation of a binomial heap than handles sparse binomial forests well. Not noticeably better than earlier typesafe version, even with all the obvious optimizations I could think of, so I prefer the above, typesafe, elegant version.
</p>
TicketLouisWassermanTue, 30 Mar 2010 17:12:06 GMTstatus changed; resolution set
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:39
http://ghc.haskell.org/trac/ghc/ticket/3909#comment:39
<ul>
<li><strong>status</strong>
changed from <em>assigned</em> to <em>closed</em>
</li>
<li><strong>resolution</strong>
set to <em>worksforme</em>
</li>
</ul>
<p>
Ticket closed. Priority queues will be added to a new package, which will be submitted for inclusion in the Haskell Platform.
</p>
Ticket