Opened 8 years ago

Closed 8 years ago

Last modified 6 years ago

#3969 closed bug (wontfix)

Poor performance of generated code on x86.

Reported by: milan Owned by:
Priority: normal Milestone:
Component: Compiler Version: 6.12.1
Keywords: x86, runtime performance Cc:
Operating System: Unknown/Multiple Architecture: x86
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

When implementing a hash function of ByteStrings, I found the Haskell implementation to be 2.5 slower than the equivalent C implementation.

The code of both functions is attached in Hash.hs and c_hash.c. You can compile using

ghc -O --make Hash.hs c_hash.c

and run C implementation as ./Hash c bstr_len and Haskell implementation as ./Hash h bstr_len.

There is no apparent problem in the Haskell implementation -- both the foldl' and the addWord8 are inlined and everything in the main loop is unboxed.

I believe the performance loss is because of bad register allocation. On x86_64 is the Haskell implementation only ~1.2 times slower.

The comparison on Intel Xeon E5520 32-bit, Windows 7, GHC 6.12.1 is in res-32bit.txt. C and Haskell implementation is run three times, and on strings of length 10, 50 and 100. All times are in seconds. The file also contains the assembler code of relevant methods.

On Intel Xeon E5320 64-bit, Fedora, GHC 6.12.1 is in res-64bit.txt.

Attachments (5)

Hash.hs (1.1 KB) - added by milan 8 years ago.
c_hash.c (157 bytes) - added by milan 8 years ago.
c_hash.h (83 bytes) - added by milan 8 years ago.
res-32bit.txt (2.9 KB) - added by milan 8 years ago.
res-64bit.txt (2.7 KB) - added by milan 8 years ago.

Download all attachments as: .zip

Change History (7)

Changed 8 years ago by milan

Attachment: Hash.hs added

Changed 8 years ago by milan

Attachment: c_hash.c added

Changed 8 years ago by milan

Attachment: c_hash.h added

Changed 8 years ago by milan

Attachment: res-32bit.txt added

Changed 8 years ago by milan

Attachment: res-64bit.txt added

comment:1 Changed 8 years ago by simonmar

Resolution: wontfix
Status: newclosed

This is exactly the kind of thing that LLVM and the new backend will fix. The problem is that the inner loop is not being treated like a loop by the code generator. On x86_64 we have registers for argument-passing which hides the problem to some extent, but the real fix is to do better native code generation.

comment:2 Changed 6 years ago by dterei

I know this is closed but just thought I would add the results these days:

ghc -O2 -optlo-O3 -fllvm Hash c_hash.c
Hash h  10: 0.29s | Hash c  10: 0.38s
Hash h  50: 0.97s | Hash c  50: 1.02s
Hash h 100: 1.94s | Hash h 100: 2.04s

I guess we could change this to fixed now instead of wontfix :).

Note: See TracTickets for help on using tickets.