Opened 5 years ago

Last modified 2 years ago

#1619 new proposed-project

Tweak memory-reuse analysis tools for GHC compatibility

Reported by: Ryan Newton Owned by:
Priority: good Keywords:
Cc: Difficulty: unknown
Mentor: not-accepted Topic: misc

Description (last modified by Ryan Newton)

Some program instrumentation and analysis tools are language agnostic. Pin and Valgrind use binary rewriting to instrument an x86 binary on the fly and thus in theory could be used just as well for a Haskell binary as for one compiled by C. Indeed, if you download Pin from pintool.org, you can use the included open source tools to immediately begin analyzing properties of Haskell workloads -- for example the total instruction mix during execution.

The problem is that aggregate data for an entire execution is rather coarse. It's not correlated temporally with phases of program execution, nor are specific measured phenomena related to anything in the Haskell source.

This could be improved. A simple example would be to measure memory-reuse distance (an architecture-independent characterization of locality) but to distinguish garbage collection from normal memory access. It would be quite nice to see a histogram of reuse-distances in which GC accesses appear as a separate layer (different color) from normal accesses.

How to go about this? Fortunately, the existing MICA pintool can build today (v0.4) and measure memory reuse distances.

http://boegel.kejo.be/ELIS/mica/

In fact, it already produces per-phase measurements where phases are delimited by dynamic instruction counts (i.e. every 100M instructions). All that remains is to tweak that definition of phase to transition when GC switches on or off.

How to do that? Well, Pin has existing methods for targeted instrumentation of specific C functions:

http://www.cs.virginia.edu/kim/publicity/pin/docs/45467/Pin/html/group__RTN__BASIC__API.html#g8622a6ba858eb8d55df4e006eb165e57

By targeting appropriate functions in the GHC RTS, this analysis tool could probably work without requiring any GHC modification at all.

A further out goal would be to correlate events observed by the binary rewriting tool and those recorded by GHC's traceEvent.

Finally, as it turns out this would NOT be the first crossing of paths between GHC and binary rewriting. Julian Seward worked on GHC before developing valgrind:

http://www.techrepublic.com/article/open-source-awards-2004-julian-seward-for-valgrind/5136747

Interested Mentors

Ryan Newton

Others??

Interested Students (Include enough identifying info to find/reach you!)

Change History (3)

comment:1 Changed 5 years ago by Ryan Newton

Description: modified (diff)

comment:2 Changed 5 years ago by Ryan Newton

Description: modified (diff)

comment:3 Changed 2 years ago by Edward Kmett

Priority: not yet ratedgood

This proposal remains viable and as of yet largely un-acted-upon.

Note: See TracTickets for help on using tickets.