Changes between Initial Version and Version 1 of Ticket #3259


Ignore:
Timestamp:
May 26, 2009 4:25:01 PM (6 years ago)
Author:
simonpj
Comment:

Well diagnosed! Your report reveals quite a long-standing and serious but, so thank you! It'll generate nasty, insidious loss of parallelism.

Here's brain dump of what is going on, just to record it for posterity. par is defined in GHC.Conc thus:

{-# INLINE par  #-}
par :: a -> b -> b
par  x y = case (par# x) of { _ -> lazy y }

-- The reason for the strange "lazy" call is that
-- it fools the compiler into thinking that pseq  and par are non-strict in
-- their second argument (even if it inlines pseq/par at the call site).
-- If it thinks par is strict in "y", then it often evaluates
-- "y" before "x", which is totally wrong.  

The function lazy is the identity function, but it is inlined only after strictness analysis, and (via some magic) pretends to be lazy. Hence par pretends to be lazy too.

The trouble is that both par and lazy are inlined into your definition of parallelise, so that the unfolding for parallelise (exposed in Parallelise.hi) does not use lazy at all. Then when compiling Main, parallelise is in turn inlined (before strictness analysis), and so the strictness analyser sees too much.

This was all sloppy thinking on my part. Inlining lazy after strictness analysis works fine for the current module, but not for importing modules. My proposed fix is to inline lazy only at the very, very end, and in particular after any unfoldings have been exposed in an interface file. That might mean that we lose some optimisations, but I don't think it'll make much difference.

However, I'm going to leave this until Simon M gets back from holiday to discuss.

Simon

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #3259

    • Property Difficulty changed from to Unknown
  • Ticket #3259 – Description

    initial v1  
    44 
    55I will attach the source files and the outputs of compilation with 
    6  
     6{{{ 
    77ghc-6.11.20090421 --make primes-test.hs -threaded -O2 -ddump-simpl 
    8  
     8}}} 
    99on a 32-bit Ubuntu 2009.4. 
    1010 
     
    1212 
    1313The only proof of this, apart from the execution time, is this line of difference between the two -ddump-simpl outputs: 
    14  
     14{{{ 
    1515> $diff main.simpl imported.simpl 
    1616> ... 
     
    2020> > a_s1sV [ALWAYS Just S] :: GHC.Integer.Internals.Integer 
    2121> ...  
     22}}}