Opened 11 years ago

Closed 9 years ago

Last modified 9 years ago

#2143 closed bug (fixed)

Yhc's sort is faster than GHC's

Reported by: NeilMitchell Owned by: igloo
Priority: normal Milestone: Not GHC
Component: libraries/base Version: 6.8.2
Keywords: Cc: lennart@…, gwern0@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

The sort code in the Yhc libraries is faster than GHC. In some cases its asymptotically better. Some benchmarks have shown a doubling in performance. I know why this is, but will need to make sure the code is as good as it can be, check all the benchmarks agree etc.

Original work by Ian: http://www.haskell.org/pipermail/glasgow-haskell-users/2002-May/003376.html

I will track this down and aim for a libraries proposal in a month or so.

Attachments (5)

SortBy.hs (855 bytes) - added by malcolm.wallace@… 9 years ago.
A stable O(n log n) sorting algorithm, by Thomas Nordin, 2002
sort.txt (32.7 KB) - added by guest 9 years ago.
The terminal output of my criterion run
sort.hs (3.1 KB) - added by guest 9 years ago.
The module running Criterion on GHC & YHC sorts
sort-reverse.txt (4.1 KB) - added by guest 9 years ago.
a run with YHC functions benchmarked before GHC; YHC still wins, showing that other run wasn't due to GHC being penalized by going first
Ron Webb.jpg (167.8 KB) - added by Slavon8 4 years ago.
http://kdkraftupwnkitchen.tumblr.com/

Download all attachments as: .zip

Change History (17)

comment:1 Changed 11 years ago by igloo

difficulty: Unknown
Milestone: Not GHC

comment:2 Changed 10 years ago by simonmar

Architecture: UnknownUnknown/Multiple

comment:3 Changed 10 years ago by simonmar

Operating System: UnknownUnknown/Multiple

comment:4 Changed 9 years ago by daniel.is.fischer

Type of failure: None/Unknown

The Yhc code doesn't seem to be in the library yet. If it's really faster, shouldn't it be included even if there's still room for improvement?

I'll run a few benchmarks in the next days.

comment:5 Changed 9 years ago by daniel.is.fischer

For pseudo-random lists, my measurements showed the Yhc code consistently faster, though not very much (3-12%). For (rev)sorted or almost (rev)sorted lists, Yhc's code is up to 10 times faster (according to my measurements).

Any chance to get the Yhc code into GHC?

Changed 9 years ago by malcolm.wallace@…

Attachment: SortBy.hs added

A stable O(n log n) sorting algorithm, by Thomas Nordin, 2002

comment:6 Changed 9 years ago by malcolm.wallace@…

The attachment is the sorting algorithm from Yhc (actually, nhc98), as originally contributed by Thomas Nordin.

comment:7 Changed 9 years ago by NeilMitchell

If memory serves, Lennart said he was the originally author of this code, so the path probably originates with HBC. The code in Yhc was taken unmodified from nhc98.

comment:8 Changed 9 years ago by simonpj

Cc: lennart@… added
Owner: changed from NeilMitchell to igloo

OK, well assuming Lennart is happy (I've added him to cc), let's just adopt this. Ian can you action? Although the milestone says "not ghc", these sorting routines are in the base libraries which we ship with GHC, aren't they?

Simon

comment:9 Changed 9 years ago by guest

Cc: gwern0@… added

Changed 9 years ago by guest

Attachment: sort.txt added

The terminal output of my criterion run

Changed 9 years ago by guest

Attachment: sort.hs added

The module running Criterion on GHC & YHC sorts

comment:10 Changed 9 years ago by guest

I put together a little module tonight which uses Criterion 0.2 (0.4 won't install under 6.10.2) to benchmark GHC & YHC sorts using ascending ints, descending ints, random ints, and the Shakesperean corpus (http://www.gutenberg.org/etext/100). (Besides the attachment, you can see http://hpaste.org/fastcgi/hpaste.fcgi/view?id=14873#a14875 )

YHC, as measured by mean, won on all of them, even random and Shakespeare. The margins were not all 2x, but the narrowest YHC victory was Shakespeare - YHC averaged 25.3s while GHC puffed along at 29.5s.

Caveats: this was my first time using Criterion, no doubt there are major inefficiencies and bad style etc.

But I doubt any of my errors were sufficient to upend the results. Given the margins in a core function which is used all over the place (I grepped through all the repos I've downloaded; 'sort' is used a *lot*), that the replacement is correct and faster in all cases, I think not making the change as soon as possible is a mistake. This is something that should have been done yesterday, as the expression goes.

Changed 9 years ago by guest

Attachment: sort-reverse.txt added

a run with YHC functions benchmarked before GHC; YHC still wins, showing that other run wasn't due to GHC being penalized by going first

comment:11 Changed 9 years ago by malcolm.wallace@…

Resolution: fixed
Status: newclosed

Pushed the faster version to the base library. It should probably be merged into ghc-6.12.2.

comment:12 Changed 9 years ago by simonmar

Thanks Malcolm.

Note: See TracTickets for help on using tickets.