Opened 9 years ago

Closed 8 years ago

#3149 closed bug (fixed)

Data.HashTable is slow

Reported by: dons Owned by:
Priority: normal Milestone:
Component: libraries/base Version: 6.10.1
Keywords: Cc: ertai, merehap@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Data.HashTable is slow. Implement something non-naive, and move it out of base. It's a bit embarassing. See e.g.

import Prelude hiding (lookup)
import Data.HashTable (hashInt, fromList, lookup)

n :: Int
n = 10000000

l :: [Int]
l = [1..n]

stream :: [(Int, Int)]
stream = zip l l

main = do
   m <- fromList hashInt stream
   v <- lookup m 100
   print v

Maybe just port the OCaml standard hashtable?

Change History (10)

comment:1 Changed 9 years ago by dons

GC destroys this benchmark (and Data.HashTable in general, it seems). Removing lists from the equation:

import Control.Monad
import qualified Data.HashTable as H

n = 10000000

main = do
    m <- H.new (==) H.hashInt
    forM_ [1..10000000] $ \n -> H.insert m n n
    v <- H.lookup m 100
    print v

51s by default, 98% GC:

$ time ./A                   
Just 100
./A +RTS  48.83s user 1.18s system 98% cpu 50.929 total

   1,787,760,112 bytes allocated in the heap
   3,333,907,256 bytes copied during GC
     836,979,232 bytes maximum residency (11 sample(s))
       4,261,176 bytes maximum slop
            1628 MB total memory in use (25 MB lost due to fragmentation)

  Generation 0:  3400 collections,     0 parallel, 44.14s, 45.15s elapsed
  Generation 1:    11 collections,     0 parallel,  3.78s,  4.96s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    0.52s  (  0.52s elapsed)
  GC    time   47.91s  ( 50.11s elapsed)
  EXIT  time    0.00s  (  0.02s elapsed)
  Total time   48.44s  ( 50.63s elapsed)

  ***%GC time      98.9%  (99.0% elapsed)***

  Alloc rate    3,416,378,481 bytes per MUT second

  Productivity   1.1% of total user, 1.0% of total elapsed

Turning up the heap hint:

$ time ./A  +RTS -H1G
Just 100
./A +RTS -H1G  6.68s user 0.99s system 98% cpu 7.778 total

   1,787,825,320 bytes allocated in the heap
   2,021,056,256 bytes copied during GC
     466,523,872 bytes maximum residency (2 sample(s))
       5,748,312 bytes maximum slop
            1117 MB total memory in use (17 MB lost due to fragmentation)

  Generation 0:     9 collections,     0 parallel,  4.68s,  5.03s elapsed
  Generation 1:     2 collections,     0 parallel,  1.16s,  1.23s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    0.84s  (  1.49s elapsed)
  GC    time    5.84s  (  6.26s elapsed)
  EXIT  time    0.00s  (  0.17s elapsed)
  Total time    6.69s  (  7.75s elapsed)

  ***%GC time      87.4%  (80.8% elapsed)***

  Alloc rate    2,120,087,562 bytes per MUT second

  Productivity  12.6% of total user, 10.9% of total elapsed

./A +RTS -sstderr -H1G  6.69s user 0.90s system 96% cpu 7.834 total

And GC is stil 89% of the time. If we could remove the GC as a factor we'd be looking at something like 0.6s on this benchmark, instead of 50s.


Using the parallel GC is fun, since you get really serious GC use:

$ ghc -O2 A.hs --make -threaded -no-recomp

on the commandline:
    Warning: -no-recomp is deprecated: Use -fforce-recomp instead
[1 of 1] Compiling Main             ( A.hs, A.o )
Linking A ...
$ time ./A  +RTS -H1G -sstderr -N2        
./A +RTS -H1G -sstderr -N2 
Just 100
   1,787,833,280 bytes allocated in the heap
   2,477,128,208 bytes copied during GC
     672,983,168 bytes maximum residency (3 sample(s))
      44,186,984 bytes maximum slop
            1349 MB total memory in use (21 MB lost due to fragmentation)

  Generation 0:    14 collections,    14 parallel,  7.63s,  4.17s elapsed
  Generation 1:     3 collections,     2 parallel,  3.47s,  2.12s elapsed

...
  Parallel GC work balance: 1.89 (309610432 / 164021678, ideal 2)

comment:2 Changed 9 years ago by dons

Component: libraries/baseRuntime System

comment:3 Changed 9 years ago by dons

Issue is the same as #650

What should we do?

comment:4 Changed 9 years ago by dons

This is one interesting way we avoided the "mutable boxed array problem" in the shootout:

http://shootout.alioth.debian.org/u64q/benchmark.php?test=knucleotide&lang=ghc&id=4

The hashtable is represent as:

data Hashtable = Hashtable {
        keySize   :: {-# UNPACK #-}!Int,
        noOfSlots :: {-# UNPACK #-}!Int,
        spine     :: {-# UNPACK #-}!(Ptr Word8)
    }

avoiding all issues of GC wandering. Give up on

Might be one candidate approach for replacing the current IOArray [(Int,a)] version.

comment:5 in reply to:  description ; Changed 9 years ago by igloo

difficulty: Unknown

Replying to dons:

Data.HashTable is slow. Implement something non-naive, and move it out of base.

We can't just move it out of base as Data.Typeable uses it.

comment:6 Changed 9 years ago by ertai

Cc: ertai added

comment:7 in reply to:  5 Changed 9 years ago by duncan

Replying to igloo:

We can't just move it out of base as Data.Typeable uses it.

That doesn't mean it has to be exposed though. I assume it's not visible in the public interface of Data.Typeable.

comment:8 Changed 9 years ago by igloo

Component: Runtime Systemlibraries/base
Milestone: _|_

comment:9 Changed 8 years ago by merehap

Cc: merehap@… added
Type of failure: None/Unknown

comment:10 Changed 8 years ago by simonmar

Resolution: fixed
Status: newclosed

I'm claiming this is fixed in 6.12.1:

$ ./3149 +RTS -s
./3149 +RTS -s 
Just 100
   1,826,563,432 bytes allocated in the heap
   3,333,946,592 bytes copied during GC
     836,996,392 bytes maximum residency (11 sample(s))
       5,558,824 bytes maximum slop
            1621 MB total memory in use (25 MB lost due to fragmentation)

  Generation 0:  3400 collections,     0 parallel,  3.17s,  3.21s elapsed
  Generation 1:    11 collections,     0 parallel,  4.43s,  5.63s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    0.62s  (  0.61s elapsed)
  GC    time    7.60s  (  8.84s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    8.23s  (  9.46s elapsed)

  %GC time      92.4%  (93.5% elapsed)

  Alloc rate    2,918,294,477 bytes per MUT second

  Productivity   7.6% of total user, 6.6% of total elapsed
Note: See TracTickets for help on using tickets.