Opened 3 years ago

Last modified 13 months ago

#9388 new bug

Narrow the scope of the notorious "state hack"

Reported by: simonpj Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.8.2
Keywords: Cc: dfeuer, nomeata
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:


The "state hack" has caused any number of bug reports (just search for that string), the most recent of which is #9349.

Here's an idea to make it less pervasive: (roughly) use the state hack only for top-level functions definitions.

The idea is that for nested lambdas the context should give the one-shot-ness, now that we are equipped with cardinality analysis. For example, consider the call

   replicatM_ 1000 (\(s :: RealWorld#) -> blah)

The lambda is 1000-shot, not one-shot, notwithstanding the type of the binder. Moreover replicateM_'s strictness/cardinality signature will say just that, and GHC already knows how to propagate that information onto the \s.

But for top level functions like

pr :: String -> IO ()
pr x = putStrLn (reverse x)

we get Core

pr = \x. let y = reverse x in
         \ (s :: State# RealWorld). putStrLn y s

and, since we can't see all the callers of pr, we don't know if work is lost by pushing the reverse call inside, to get

pr = \x. (s :: State# RealWorld). putStrLn (reverse x) s

which is much more efficient. Indeed, this might not be so good if the calls looked like

 ... replicateM_ 1000 (pr "foo")...

because then "foo" will be reversed 1000 times. But arguably that's what the programmer expects anyway, looking at the code; and the efficiency hit from not eta-expanding all functions like pr (which are very very common) is significant.

The point is the that the only ones that need hacking are the top-level guys, and maybe even the top-level exported guys.

I have not fully thought this through, let alone tried it out, but I wanted to capture the thought. It would need some careful performance testing.


Change History (7)

comment:1 Changed 2 years ago by dfeuer

Cc: dfeuer added

comment:2 Changed 2 years ago by nomeata

Cc: nomeata added

comment:3 Changed 2 years ago by nomeata

I’m considering giving this one shot (pun intended), and I’m reading the current implementation of the state hack, and found some pretty old-dated comments, including this one from 2004 in Id.hs:

       -- Another good example is in fill_in in PrelPack.lhs.  We should be able to
       -- spot that fill_in has arity 2 (and when Keith is done, we will) but we can't yet.

I do not know who Keith is, and why that would help, and most likely something about todays demand analyzer should be written there. Simon, if you have a minute, would you mind revisiting the comment at isStateHack? It would help whoever tackles the problem next. Thanks!

comment:4 Changed 2 years ago by nomeata

Has someone implemented this without telling us?

In GHC-7.8.4, compiling

pr :: String -> IO ()
pr x = putStrLn (reverse x)

without the state hack yields

pr =
  \ x_arX ->
    let s_aQl = reverse1 x_arX ([]) in
    (\ eta_aQm -> hPutStr2 stdout s_aQl True eta_aQm) `cast` ...

and only with the state hack, we get the good

pr1 = \ x_arX eta_B1 -> hPutStr2 stdout (reverse x_arX) True eta_B1

But with GHC HEAD, the output is good with and without -fno-state-hack. So either something implemented this, or the state hack flag gets ignored for some reason.

comment:5 Changed 2 years ago by nomeata

Ah, I think it might have been me, at least partially:

In MkId, the realWorldPrimId has setOneShotInfo stateHackOneShot set. This means that _every_ lambda with an argument of type State# is marked as oneshot. This was introduced in changeset:80989de/ghc – maybe this bit was not even meant to be merged? Back then, the OneShotInfo would not be exported, so the unfolding for hPutStr2 would say (\ s ::String eta :: State# RealWorld -> ... But when I added the oneShot magic function, I also made sure that the OneShotInfo would be written to the interface, so it now says (\ s ::String eta :: State# RealWorld[oneShot] -> ... and the simplifier will float in the call to reverse.

I’ll rip out all state hack code (which seems to be implemented by OneShotInfo on realWorldPrimId and via typeOneShot) and see if I can achieve something similar in a cleaner way and only for top-level bindings in the cardinality analysis. It’s a nice weekend relaxation from paper proofreading :-)

comment:6 Changed 2 years ago by nomeata

I gave it a shot. The obvious place to make the demand analyzer believe that something of type IO is going to be called at most once is by adjusting body_dmd in dmdAnalRhs, similarly to [Product demands for function body].

The branch wip/T9388 contains patches that remove the old state hack and introduce this one.

On first glance, it seems to work. At least the bug in #10102 does not occur any more.

Overall, it has a negative effect on performance (at least on bytes allocated): nofib’s fibheaps regresses by 16.7%, banner by 12.4% and hpg by 7.1%, rewrite by 5.8%. Others improve: k-nucleotide by 5%. Geometric mean is a regression by 0.4%. A quick diff of fibheap’s core shows large changes, which I did not investigate further.

The branch validates, with the exception of a number of compiler performance benchmarks (T9203 haddock.Cabal haddock.base T5030 T5631 T9872c T9872a T5642 T9020 T3064 T9872d T9872b T1969), where allocation increases by up to 30%. So real world programs run better with the old state hack.

In a way that is expected: Eta expansion is usually a good thing, even if cannot see that a the lambda is indeed one-shot. And even if it is not really one-shot, the benefits of eta-expansion might, in some cases, outweigh the cost of lost sharing. Only in a few (uncommon?) cases, e.g. replicate with a large number, it really hurts.

I’m surprised that no test case starts to fail, and no known-to-fail test case suddenly passes; it seems that we have little coverage on the desired and/or unwanted effects of the state hack.

comment:7 Changed 13 months ago by thomie

Type of failure: None/UnknownRuntime performance bug
Note: See TracTickets for help on using tickets.