Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#7674 closed task (fixed)

Separate StablePtr table from StableName table.

Reported by: EyalLotem Owned by:
Priority: normal Milestone: 7.8.1
Component: Runtime System Version: 7.6.2
Keywords: rts stable_ptr stable_name performance Cc: eyal.lotem@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Revisions:

Description

Currently, there is one table for these two difference concepts which makes each one of them less efficient.

The patches that address this improve some benchmarks' performance by a factor of ~3.

Change History (20)

comment:1 Changed 3 years ago by EyalLotem

  • Status changed from new to patch

The benchmark is the small_hash_hs FFI binding to the small_hash library. They're at:

https://github.com/Peaker/small_hash https://github.com/Peaker/small_hash_hs

(To build the latter, you need to clone the former directly inside it).

If you run small_hash_hs, it's runtime will be dominated by GC's, because of #7670. (~8 sec on my machine).

However, if you set the min heap size to 200M (via "+RTS -H200M"), the run time becomes dominated by the stable ptr hash table/etc. To run the benchmark on ordinary GHC, you need to use small_hash_hs's "unpatched_ghc" branch.

Patches at: https://github.com/Peaker/ghc/commits/separate_stable_tables

On my hardware:

Before patch After patch

Hash insertion time 2.4 0.8

Total time 3.8 1.2

Hash insertion time 2.9 1.2 (-threaded)

Total time 4.8 1.6 (-threaded)

comment:2 Changed 3 years ago by EyalLotem

Before patch After patch
Hash insertion time 2.4 0.8
Total time 3.8 1.2
Hash insertion time (-threaded) 2.9 1.2
Total time (-threaded) 4.8 1.6

comment:3 Changed 3 years ago by EyalLotem

Also ran the benchmark without the specified heap size:

unpatched patched
-threaded, no heap size 24.5s 10.1
no heap size 23.1s 8.3s

comment:4 Changed 3 years ago by EyalLotem

As an interesting reference point, this benchmark is doing 10 million insertions of unboxed ints to boxed ints and then destruction of the hash table.

The small_hash benchmark, which inserts 10 million unboxed ints to unboxed ints takes 0.39sec.

It does bulk allocations of the hash nodes, which is hard to do in the FFI binding because the allocation happens in general-purpose code, so it should support releasing back memory.

The 1.2 sec is about a factor of 3 of overhead which may be reasonable given the extra indirection due to stable ptrs, the int boxing and the more fine-grained allocation.

Changed 3 years ago by EyalLotem

Changed 3 years ago by EyalLotem

comment:5 Changed 3 years ago by EyalLotem

  • Owner EyalLotem deleted

comment:6 Changed 3 years ago by simonmar

  • difficulty set to Unknown

I'm afraid something is broken with StableNames in your patch: test memo001 in libraries/base fails for me (this is a stress test for StableNames, it isn't run by default in validate, but it should be).

comment:7 Changed 3 years ago by simonmar

  • Status changed from patch to new

comment:8 Changed 3 years ago by EyalLotem

  • Cc eyal.lotem@… added

Changed 3 years ago by EyalLotem

The memo001 test seems to work now

comment:9 Changed 3 years ago by EyalLotem

  • Status changed from new to patch

I replaced all 7 patches with "StableTables.patch", it is more manageable.

comment:10 Changed 3 years ago by marlowsd@…

commit 7e7a4e4d7e9e84b2c57d3d55e372e738b5f8dbf5

Author: Simon Marlow <[email protected]>
Date:   Thu Feb 14 08:46:55 2013 +0000

    Separate StablePtr and StableName tables (#7674)
    
    To improve performance of StablePtr.

 includes/HsFFI.h                         |    4 +
 includes/rts/Stable.h                    |   17 +-
 includes/stg/MiscClosures.h              |    2 +-
 rts/Hash.c                               |    5 +
 rts/Hash.h                               |    5 +-
 rts/HsFFI.c                              |   18 ++
 rts/Linker.c                             |    8 +-
 rts/PrimOps.cmm                          |   22 +-
 rts/RetainerProfile.c                    |    2 +-
 rts/RtsStartup.c                         |    4 +-
 rts/Stable.c                             |  460 +++++++++++++++++-------------
 rts/Stable.h                             |   29 ++-
 rts/sm/Compact.c                         |    2 +-
 rts/sm/GC.c                              |   12 +-
 utils/deriveConstants/DeriveConstants.hs |    3 +
 15 files changed, 355 insertions(+), 238 deletions(-)

comment:11 Changed 3 years ago by simonmar

  • Resolution set to fixed
  • Status changed from patch to closed

comment:12 Changed 3 years ago by simonmar

  • Milestone set to 7.8.1
Note: See TracTickets for help on using tickets.