Opened 2 months ago

Closed 6 days ago

Last modified 10 hours ago

#15696 closed bug (fixed)

Derived Ord instance for enumerations with more than 8 elements seems to be incorrect

Reported by: mrkkrp Owned by: osa1
Priority: highest Milestone: 8.6.2
Component: Compiler Version: 8.6.1
Keywords: Cc: dfeuer, osa1, j.waldmann, maoe, alpmestan, sgraf, asr
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Incorrect result at runtime Test Case: codeGen/should_run/T15696_1, T15696_2, T15696_3
Blocked By: Blocking:
Related Tickets: #14677, #15155 Differential Rev(s): Phab:D5196, Phab:D5201, Phab:D5226
Wiki Page:

Description

Sometimes.

Here is the thread that explains the bug:

https://github.com/haskell/containers/issues/568

I originally reported this as a bug on containers issue tracker, but we seem to have concluded that this is probably a bug in the GHC optimizer itself.

I think the shortest repro so far is this:

import qualified Data.Set as S

main = print $
  let {-# noinline f #-}
      f () = T2
  in  S.fromList [f (), f ()]
  
data T = T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 | T9
  deriving (Show, Read, Eq, Ord, Bounded, Enum)

which prints

fromList [T2,T2]

The person who derived this from my original repro says:

And as I said earlier, comment out the T9 constructor => prints fromList [T2] as it should.

Another interesting quote:

Can confirm. Tested with ghc-8.6.1, containers-0.6.0.1 and leancheck-0.7.5 (so it does not seem to depend on the testing framework). Error occurs:

  • with ghc -O1 and -O2 (but not with -O0)
  • and if data type has at least 9 elements

So, likely a bug in ghc's optimizer.

in some cases, input has duplicates, but not always.

This is a bad one, makes GHC 8.6.1 totally unusable for me.

Change History (92)

comment:1 Changed 2 months ago by dfeuer

Cc: dfeuer added
Priority: normalhighest
Type of failure: None/UnknownIncorrect result at runtime

comment:2 Changed 2 months ago by simonpj

That's terrible!

The following all produce the same (wrong) restul

 S.fromList (map f [ () | _ <- [1..10] ])    -- T2,T2
 S.fromList [f (), f ()]                     -- T2,T2
 S.fromList [f (), f (), f()]                -- T2,T2

However this works fine

 f () `S.insert` (f () `S.insert` S.empty)

The comparison operations generated by deriving look ok to me.

comment:3 Changed 2 months ago by osa1

Cc: osa1 added

The problem is in this Core generated for this program:

-- RHS size: {terms: 5, types: 2, coercions: 0, joins: 0/0}
f_r6li
f_r6li = \ ds_d3M6 -> case ds_d3M6 of { () -> T2 }

-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
lvl_r6lj
lvl_r6lj = f_r6li ()

main2
main2
  = case dataToTag# lvl_r6lj of a#_a2rY { __DEFAULT ->
    case dataToTag# lvl_r6lj of b#_a2rZ { __DEFAULT ->
    ...

We get the tag of a CAF (lvl_r6lj) before evaluating it, so we get tag of a thunk. The need for evaluating argument of dataToTag# is explained in Note [dataToTag#] in primops.txt.pp. It seems like we're inlining getTag, and then somehow eliminating the case expression in getTag (which is supposed to evaluate the argument). If I change the INLINE annotation of getTag to NOINLINE this works as expected.

I don't know why we're elminating the case in getTag after inlining it yet.

Last edited 2 months ago by osa1 (previous) (diff)

comment:4 Changed 2 months ago by osa1

Ah, so it turns out we have a special case in CorePrep (which runs after simplifications) about dataToTag#, and we generate a case expression around its argument after all the simplifications. It's explained in Note [dataToTag magic] in CorePrep, and I can see in STG that it works as expected:

lvl_r6ru = \u [] f_r6rt ();

lvl1_r6rv = CCS_DONT_CARE :! [lvl_r6ru []];

main2 =
    \u []
        case
            case lvl_r6ru of sat_s6xD { __DEFAULT -> dataToTag# [sat_s6xD]; }
        of
        a#_s6xE
        { __DEFAULT ->
              case
                  case lvl_r6ru of sat_s6xF { __DEFAULT -> dataToTag# [sat_s6xF]; }
              of
              ...

So perhaps this is not because we get tag of a thunk. I don't know why not inlining getTag fixes this.

comment:5 Changed 2 months ago by j.waldmann

Cc: j.waldmann added

comment:6 Changed 2 months ago by RyanGlScott

Here's an example which doesn't depend on any code from containers. It also makes the derived Ord code explicit:

{-# LANGUAGE MagicHash #-}
module Main where

import qualified Data.Foldable as Foldable
import GHC.Exts (dataToTag#, tagToEnum#, (==#), (<#))

main :: IO ()
main | not_ordered a b = print $ Foldable.foldl' (flip wumbo) (singleton a) b
     | otherwise       = pure ()
  where
    {-# NOINLINE f #-}
    f () = T2
    {-# NOINLINE a #-}
    a = f ()
    {-# NOINLINE b #-}
    b = [f ()]

data T = T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 | T9
  deriving (Eq, Show)

instance Ord Main.T where
  compare a b
    = case dataToTag# a of
        a' -> case dataToTag# b of
                b' -> if tagToEnum# (a' <# b') :: Bool then
                          LT
                      else
                          if tagToEnum# (a' ==# b') :: Bool then
                              EQ
                          else
                              GT

data Set a = Bin !a !(Set a) !(Set a)
           | Tip
  deriving Show

not_ordered :: Ord a => a -> [a] -> Bool
not_ordered _ [] = False
not_ordered x (y : _) = x >= y

wumbo :: Ord a => a -> Set a -> Set a
wumbo x0 = go x0 x0
  where
    go :: Ord a => a -> a -> Set a -> Set a
    go orig _ Tip = singleton orig
    go orig x t@(Bin y l r) = case compare x y of
        LT -> error "not used here"
        GT -> Bin y l (go orig x r)
        EQ -> t
{-# INLINE wumbo #-}

singleton :: a -> Set a
singleton x = Bin x Tip Tip

comment:7 Changed 2 months ago by simonpj

That's really helpful Ryan, thank you.

comment:8 Changed 2 months ago by RyanGlScott

Apologies, I meant to mention in comment:6 what wumbo actually is: it's a stripped down version of insert intended to highlight that it appears to behave differently between GHC 8.4.3 and 8.6.1. The semantics of wumbo differs from that of insert, but here is the important bit:

$ /opt/ghc/8.4.3/bin/ghc -O2 -fforce-recomp Bug.hs && ./Bug
[1 of 1] Compiling Main             ( Bug.hs, Bug.o )
Linking Bug ...
Bin T2 Tip Tip

$ /opt/ghc/8.6.1/bin/ghc -O2 -fforce-recomp Bug.hs && ./Bug
[1 of 1] Compiling Main             ( Bug.hs, Bug.o )
Linking Bug ...
Bin T2 Tip (Bin T2 Tip Tip)

GHC 8.4.3's answer is definitely the correct one, since the only way you'd get 8.6.1's answer is by hitting the GT case of wumbo (which shouldn't happen if you're comparing T2 to T2).

comment:9 Changed 2 months ago by monoidal

I'm minimizing this.

comment:10 Changed 2 months ago by bgamari

Milestone: 8.6.18.6.2

comment:11 Changed 2 months ago by dfeuer

Didn't we just recently start using some pointer tagging for types with more than 7 constructors? I'm thinking something could be a drop off in that code.

comment:12 Changed 2 months ago by monoidal

Here's a smaller version.

ghc T15696 && ./T15696 prints EQ correctly, ghc -O T15696 && ./T15696 prints LT.

{-# LANGUAGE MagicHash #-}
module Main where

import GHC.Exts (dataToTag#, tagToEnum#, (==#), (<#))

main :: IO ()
main = print $ compare a T2
  where
    {-# NOINLINE f #-}
    f = T2
    {-# NOINLINE a #-}
    a = f

data T = T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 | T9
  deriving (Eq, Show, Ord)

{-
instance Ord Main.T where
  compare a b
    = case dataToTag# a of
        a' -> case dataToTag# b of
                b' -> if tagToEnum# (a' <# b') :: Bool then
                          LT
                      else
                          if tagToEnum# (a' ==# b') :: Bool then
                              EQ
                          else
                              GT
-}

comment:13 Changed 2 months ago by simonpj

Avoiding the use of type classes:

main :: IO ()
main = print $ cmpT a T2
  where
    {-# NOINLINE f #-}
    f = T2
    {-# NOINLINE a #-}
    a = f

data T = T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 | T9
--  deriving (Eq, Show, Ord)

cmpT a b
    = case dataToTag# a of
        a' -> case dataToTag# b of
                b' -> if tagToEnum# (a' <# b') :: Bool then
                          LT
                      else
                          if tagToEnum# (a' ==# b') :: Bool then
                              EQ
                          else
                              GT

With -O we get LT for GHC 8.4 and 8.2 and earlier versions. Without -O it returns EQ as it should.

comment:14 Changed 2 months ago by simonpj

Omer, might you look at this. With -ddump-stg I see

Main.cmpT :: forall a1 a2. a1 -> a2 -> GHC.Types.Ordering
[GblId, Arity=2, Caf=NoCafRefs, Str=<S,U><S,U>, Unf=OtherCon []] =
    [] \r [a2_s3tf b_s3tg]
        case
            case a2_s3tf of sat_s3th [Occ=Once] {
              __DEFAULT -> dataToTag# [sat_s3th];
            }
        of
        a'_s3ti
        { __DEFAULT ->
              case
                  case b_s3tg of sat_s3tj [Occ=Once] {
                    __DEFAULT -> dataToTag# [sat_s3tj];
                  }
              of
              b'_s3tk
              { __DEFAULT ->
                    case <# [a'_s3ti b'_s3tk] of {
                      __DEFAULT ->
                          case ==# [a'_s3ti b'_s3tk] of {
                            __DEFAULT -> GHC.Types.GT [];
                            1# -> GHC.Types.EQ [];
                          };
                      1# -> GHC.Types.LT [];
                    };
              };
        };

which looks right. In another variant (I made dataToTag# lazy) I saw

Main.cmpT :: forall a1 a2. a1 -> a2 -> GHC.Types.Ordering
[GblId,
 Arity=2,
 Caf=NoCafRefs,
 Str=<S,1*U><S,1*U>,
 Unf=OtherCon []] =
    [] \r [a2_s3tf b_s3tg]
        case a2_s3tf of x1_s3th [Occ=Once] {
          __DEFAULT ->
              case dataToTag# [x1_s3th] of a'_s3ti {
                __DEFAULT ->
                    case b_s3tg of x2_s3tj [Occ=Once] {
                      __DEFAULT ->
                          case dataToTag# [x2_s3tj] of b'_s3tk {
                            __DEFAULT ->
                                case <# [a'_s3ti b'_s3tk] of {
                                  __DEFAULT ->
                                      case ==# [a'_s3ti b'_s3tk] of {
                                        __DEFAULT -> GHC.Types.GT [];
                                        1# -> GHC.Types.EQ [];
                                      };
                                  1# -> GHC.Types.LT [];
                                };
                          };
                    };
              };
        };

which also looks fine. But both stubbornly return LT instead of EQ. This must be a code-gen or RTS issue. I have not looked at the Cmm. Might you do so?

Last edited 2 months ago by simonpj (previous) (diff)

comment:15 Changed 2 months ago by RyanGlScott

Weirdly enough, I get different answers than the ones simonpj reported for the program in comment:13. To be explicit, if I'm using this program:

{-# LANGUAGE MagicHash #-}
module Main where

import GHC.Exts (dataToTag#, tagToEnum#, (==#), (<#))

main :: IO ()
main = print $ compare a T2
  where
    {-# NOINLINE f #-}
    f = T2
    {-# NOINLINE a #-}
    a = f

data T = T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 | T9
  deriving (Eq, Show)

instance Ord Main.T where
  compare a b
    = case dataToTag# a of
        a' -> case dataToTag# b of
                b' -> if tagToEnum# (a' <# b') :: Bool then
                          LT
                      else
                          if tagToEnum# (a' ==# b') :: Bool then
                              EQ
                          else
                              GT

Then I consistently get LT regardless of optimization level:

$ /opt/ghc/8.6.1/bin/ghc -O0 -fforce-recomp Bug.hs && ./Bug 
[1 of 1] Compiling Main             ( Bug.hs, Bug.o )
Linking Bug ...
LT

$ /opt/ghc/8.6.1/bin/ghc -O2 -fforce-recomp Bug.hs && ./Bug 
[1 of 1] Compiling Main             ( Bug.hs, Bug.o )
Linking Bug ...
LT

If I replace all uses of dataToTag# with getTag, however:

{-# LANGUAGE MagicHash #-}
module Main where

import GHC.Base (getTag)
import GHC.Exts (tagToEnum#, (==#), (<#))

main :: IO ()
main = print $ compare a T2
  where
    {-# NOINLINE f #-}
    f = T2
    {-# NOINLINE a #-}
    a = f

data T = T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 | T9
  deriving (Eq, Show)

instance Ord Main.T where
  compare a b
    = case getTag a of
        a' -> case getTag b of
                b' -> if tagToEnum# (a' <# b') :: Bool then
                          LT
                      else
                          if tagToEnum# (a' ==# b') :: Bool then
                              EQ
                          else
                              GT

Only then do I get EQ without optimization:

$ /opt/ghc/8.6.1/bin/ghc -O0 -fforce-recomp Bug.hs && ./Bug 
[1 of 1] Compiling Main             ( Bug.hs, Bug.o )
Linking Bug ...
EQ

$ /opt/ghc/8.6.1/bin/ghc -O2 -fforce-recomp Bug.hs && ./Bug 
[1 of 1] Compiling Main             ( Bug.hs, Bug.o )
Linking Bug ...
LT

What's more, I consistently get the same sets of results in each version of GHC dating back to 8.2.2. This makes me believe that the bug that was exposed here has actually been lurking for quite a while (but perhaps a difference in inlining behavior in 8.6 only just recently exposed it).

comment:16 Changed 2 months ago by RyanGlScott

The fact that using dataToTag# directly produces incorrect results is perhaps not terribly surprising, since it must always be applied to an evaluated argument (see Note [dataToTag#]). The fact that the version using getTag breaks is more worrisome, since getTag actually forces its argument (with a bang pattern).

Last edited 2 months ago by RyanGlScott (previous) (diff)

comment:17 Changed 2 months ago by maoe

Cc: maoe added

comment:18 Changed 2 months ago by RyanGlScott

It's also worth noting that you can trim T down to just two constructors:

{-# LANGUAGE MagicHash #-}
module Main where

import GHC.Base (getTag)
import GHC.Exts (tagToEnum#, (==#), (<#))

main :: IO ()
main = print $ compare a T2
  where
    {-# NOINLINE f #-}
    f = T2
    {-# NOINLINE a #-}
    a = f

data T = T1 | T2
  deriving (Eq, Show)

instance Ord Main.T where
  compare a b
    = case getTag a of
        a' -> case getTag b of
                b' -> if tagToEnum# (a' <# b') :: Bool then
                          LT
                      else
                          if tagToEnum# (a' ==# b') :: Bool then
                              EQ
                          else
                              GT

And the bug will still trigger. (If you define data T = T2, then the bug will go away.)

comment:19 in reply to:  16 Changed 2 months ago by dfeuer

Replying to RyanGlScott:

The fact that using dataToTag# directly produces incorrect results is perhaps not terribly surprising...

Except that we have special code to make sure that always happens regardless. The STG Simon pasted looks right from that standpoint: it always applies dataToTag# under a case on its argument. So my bet is that Simon is right: something is going wrong in code generation or the RTS.

Side note: it looks like I was wrong about tagging large types. I believe there was some talk of that, but it doesn't look like it's happened as yet.

comment:20 in reply to:  11 Changed 2 months ago by heisenbug

Replying to dfeuer:

Didn't we just recently start using some pointer tagging for types with more than 7 constructors? I'm thinking something could be a drop off in that code.

Nope. That code is not ripe yet. This summer was very busy for me.

Another thing: With Peter I discovered last Haskell eXchange (Oct. 2017) that enumerated types with derived Enum/Ord instances could use getTag to derive a constant-time compare (and (==) too). We were not aware of this optimization already being in the codebase. We'll surely revisit this.

Last edited 2 months ago by heisenbug (previous) (diff)

comment:21 Changed 2 months ago by osa1

Differential Rev(s): Phab:D5196
Status: newpatch

Simon, please review. There are a few ways to fix this and I probably did the most conservative thing by always introducing a case around dataToTag#. Perhaps we want to change how we record evaluated-ness of CAFs in their Ids instead.

We may also want to pay more attention to #14677 if we rely on strictness annotations of data constructor fields to decide on evaluatedness of values elsewhere.

comment:22 Changed 2 months ago by osa1

One other thing to note is, as other commenters above already mentioned, this is _not_ a new bug! The bug was there since years. I think something else (maybe changes in simplifier) revealed it.

comment:23 Changed 2 months ago by alpmestan

Cc: alpmestan added

comment:24 Changed 2 months ago by osa1

Thinking about this more. I think the only case where we can actually avoid entering the argument of dataToTag# is when the argument is a static closure. For anything else (a dynamic closure, a CAF) we need to enter it. I don't know how often people do dataToTag# C (for some constructor C), perhaps it's fine to always enter it. If it is then we can remove the special case for dataToTag# in CorePrep and in the codegen (in Cmm) we can just enter the argument.

If we decide to do that then perhaps we should also revisit the getTag and remove the bang pattern on its argument.

comment:25 in reply to:  24 Changed 2 months ago by dfeuer

Replying to osa1:

Thinking about this more. I think the only case where we can actually avoid entering the argument of dataToTag# is when the argument is a static closure. For anything else (a dynamic closure, a CAF) we need to enter it.

I'm a tad confused. If the tag bits are non-zero, can't we just use them and avoid dereferencing the pointer? Or is that already folded into "enter it"? Or am I just wrong?

comment:26 Changed 2 months ago by dfeuer

Ah... I am not entirely wrong, but I am partly wrong, because types with many constructors get tagged 1 when evaluated. I think we should at least be able to add a rule that recognizes when dataToTag# is used with a type with only a few constructors: see the logic in the cgAlts gc_plan bndr (AlgAlt tycon) alts case in compiler/codeGen/StgCmmExpr.hs. That way we only have to call getConstrTag when

  1. The type is unknown,
  2. The type has too many constructors, or
  3. The tag bits are 0.

I imagine you can do this quite easily by copying a bit of the splitTyConApp_maybe logic from tagToEnumRule to dataToTagRule in PrelRules.hs and adding a DataToTagUsingTagOp to supplement DataToTagOp.

comment:27 Changed 2 months ago by dfeuer

Aha! Actually, I believe even one fall-back should be able to use tag bits, as long as they're greater than 1. So:

When the type is recognized as having a small number of constructors

Check the tag bits. If they are 0, enter the closure. Otherwise, subtract 1 to get the tag.

When the type is unknown

Check the tag bits. If they are less than or equal to 1, enter the closure. Otherwise, subtract 1 to get the tag.

When the type has too many constructors

Enter the closure.

comment:28 Changed 2 months ago by sgraf

Cc: sgraf added

comment:29 Changed 2 months ago by dfeuer

I guess what I'm suggesting would be implemented most easily by replacing the dataToTag# primop a little bit.

unKnownType, smallType, largeType :: Int#
unKnownType = 0#
smallType = 1#
largeType = 2#

dataToTag# x = dataToTagUsing# unknownType x

-- Takes the "strategy" to use
dataToTagUsing# :: Int# -> a -> Int#

This arrangement is designed so we shouldn't have to do any painful restructuring of the caseRules or dataToTagRule.

The only potential problem I see is if any user-written RULES match on dataToTag#, because that won't work anymore. Should we worry about that?

comment:30 Changed 2 months ago by simonpj

Before deciding on a solution, let's record what the problem is.

The original thinking was this.

  • dataToTag# is a primop, so it has no business doing anything as complicated as evaluating its argument; we already have case expressions that the code-gen knows how to compile.
  • So dataToTag# expects an evaluated argument; in fact, stronger than that, it expects a pointer to the data value itself, so that it can extract the tag directly from the info table.
  • But if it so happens that the value in question already is evaluated, that's a waste. Example
    f x = case x of y -> ...(dataToTag# y)...
    
    It seems a bit silly for dataToTag# to re-evaluate y.
  • Hence, in CorePrep (see Note [dataToTag magic]) we add a case around the argument to dataToTag# unless the argument is already evaluated.
  • The "already-evaluated" test is exprIsHNF.
  • But alas, while exprIsHNF guarantees that the thing will evaluate in O(1) time, it does not guarantee that we have a pointer to the data value itself. Omer accurately diagnoses this problem in the Note in his draft Phab:D5196:
    Note [Always introduce case around dataToTag# arg]
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    We used to only generate a case expression around dataToTag# argument when it's
    not known to be evaluated (checked with `exprIsHNF`), but this is incorrect.
    Here's an example:
    
      data T = T1 | T2
    
      main = print (I# (dataToTag# a))
        where
          {-# NOINLINE f #-}
          f = T2
          {-# NOINLINE a #-}
          a = f
    
    `f` is a static closure
    
      f_r1zz :: Main.T
      [GblId, Caf=NoCafRefs, Unf=OtherCon []] =
          CCS_DONT_CARE Main.T2! [];
    
    but `a` is an _updatable_ CAF!
    
      a_r1zA :: Main.T
      [GblId, Unf=OtherCon []] =
          [] \u [] f_r1zz;
    
    An updatable CAF is not in HNF (entering the closure does the `newCAF()` stuff
    and updates the closure with an IND_STATIC, the usual CAF evaluation routines),
    but according to `exprIsHNF` `a` is!
    
  • I thought of having a more conservative test in CorePrep. But the rot goes further. Consider this, from Note [dataToTag magic]
       data T = MkT !Bool
       f v = case v of
                MkT y -> dataToTag# y
    
    We certainly know that y will be evaluated, because MkT is a strict constructor. But does it guarantee to point directly to the data value? No! The case-to-let transformation in the Simplifier (Simplify.doCaseToLet) uses exprIsHNF and hence will drop the eval on MkT's argument for things like Omer's a binding. And that is right in a way: the argument to MkT a certainly isn't bottom! But nor does it point to the data value.

So that is a problem. But I think there is another. Look in comment:13 and comment:14. Here we have the extra case expressions correctly inserted, but we still get the wrong answer. And this is affected by reducing the number of constructor from 9 to 8. So I think there may be two separate bugs.

I'd like to understand the second before looking for a fix. Omer: could you investigate why comments 13 and 14 go wrong?

Last edited 2 months ago by simonpj (previous) (diff)

comment:31 Changed 2 months ago by osa1

Sure. The problem in comment:13 is exactly what I explained in my patch (and you also copied it here in comment:30). I get this STG with GHC HEAD, -O0:

f_r1C3 :: Main.T
[GblId, Caf=NoCafRefs, Unf=OtherCon []] =
    CCS_DONT_CARE Main.T2! [];

a_r1C4 :: Main.T
[GblId, Unf=OtherCon []] =
    [] \u [] f_r1C3;

sat_s1OH :: GHC.Types.Ordering
[LclId] =
    [] \u []
        case dataToTag# [a_r1C4] of a'_s1OD {
          __DEFAULT ->
              case <# [a'_s1OD 1#] of sat_s1OE [Occ=Once] {
                __DEFAULT ->
                    case tagToEnum# [sat_s1OE] of {
                      GHC.Types.False ->
                          case a'_s1OD of {
                            __DEFAULT -> GHC.Types.GT [];
                            1# -> GHC.Types.EQ [];
                          };
                      GHC.Types.True -> GHC.Types.LT [];
                    };
              };
        };

Notice that we do dataToTag# [a_r1C4] and a_r1C4 is an updatable CAF. The result is I get LT instead of EQ. (this is one of the regression tests I added)

I get very similar STG (with the same bug) and same results with these configurations: GHC HEAD, GHC 8.2.2, GHC 8.0.2. All tried with -O0, -O1, -O2.

I'm guessing that you added a NOINLINE for cmpT in comment:14. When I do that I get the right answer with all optimisation settings (GHC HEAD), and that makes sense becuase the arguments are now known to be evaluated and exprIsHNF correctly returns False for the arguments, so we do case on the args. I don't know how you get incorrect result in the STG shown in comment:14. Could it be that you used an older binary of the test program or something like that? If you give more detailed instructions to reproduce I can take a look.

I agree that if the STG in comment:14 is producing wrong result then there's at least one more bug. However I get identical STG when I add NOINLINE aroung cmpT (with GHC HEAD, with -O2) and it works as expected. You said you also made dataToTag# lazier, but because I get identical STG as you I haven't tried to change the primop laziness (it shouldn't matter in STG).

comment:32 Changed 2 months ago by simonpj

I get this STG with GHC HEAD, -O0:

That's what you get for monoidal's code in comment:18. Note the missing eval before the dataToTag#.

But it is not what you get for the code in comment:13. The evals are there, as comment:14 shows. And yet we get the wrong answer.

But why? That is the second bug. Can you see why that happens?

comment:33 Changed 2 months ago by osa1

I'm confused

That's what you get for monoidal's code in comment:18

I didn't even look at comment:18. I tried your code in comment:13 and what happens there is exactly what I explained in my patch.

The evals are there, as comment:14 shows.

comment:14 shows that evals are there if I don't inline cmpT, and that makes sense as I explained in comment:31. We can't assume that args are evaluated so we eval them.

And yet we get the wrong answer.

I get the right answer when I get the STG in comment:14.

Could you try to reproduce the error in comment:14 and give me more detailed instructions (showing the source and invoked GHC commands)?

comment:34 Changed 2 months ago by simonpj

Oops sorry. I'd modified the code slightly! Try this

{-# LANGUAGE MagicHash #-}
module Main where

import GHC.Exts (dataToTag#, tagToEnum#, (==#), (<#))
import GHC.Base ( getTag )

main :: IO ()
main = print $ cmpT a T2
  where
    {-# NOINLINE f #-}
    f = T2
    {-# NOINLINE a #-}
    a = f

data T = T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 | T9
--  deriving (Eq, Show, Ord)

cmpT a b
    = case getTag a of
        a' -> case getTag b of
                b' -> if tagToEnum# (a' <# b') :: Bool then
                          LT
                      else
                          if tagToEnum# (a' ==# b') :: Bool then
                              EQ
                          else
                              GT

Compile with -O and run. It prints LT when it should print EQ. But the evals are there!

comment:35 Changed 2 months ago by osa1

OK. This is the STG I get:

Main.main_f [InlPrag=NOINLINE] :: Main.T
[GblId, Caf=NoCafRefs, Unf=OtherCon []] =
    CCS_DONT_CARE Main.T2! [];

Main.main_a [InlPrag=NOINLINE] :: Main.T
[GblId, Unf=OtherCon []] =
    [] \u [] Main.main_f;

Main.main1 :: GHC.Base.String
[GblId] =
    [] \u []
        case dataToTag# [Main.main_a] of a'_s3tm {
          __DEFAULT ->
              case <# [a'_s3tm 1#] of {
                __DEFAULT ->
                    case a'_s3tm of {
                      __DEFAULT -> GHC.Show.$fShowOrdering1;
                      1# -> GHC.Show.$fShowOrdering3;
                    };
                1# -> GHC.Show.$fShowOrdering5;
              };
        };

Exactly the same problem, no evals around the CAF (main_a).

Are you maybe confused because in the STG dump you also see the STG for cmpT? It's actually inlined in main so the top-level for cmpT is not used.

comment:36 Changed 2 months ago by simonpj

Are you maybe confused because in the STG dump you also see the STG for cmpT? It's actually inlined in main so the top-level for cmpT is not used.

That is exactly what I was doing. Thanks!

I backed up some more to look at comment:6, which does not have a top-level no-inline CAF. I see this:

$sgo_r5Tk :: Main.T -> Main.T -> Main.Set Main.T -> Main.Set Main.T
[GblId, Arity=3, Str=<L,1*U><L,U><S,1*U>, Unf=OtherCon []] =
    sat-only [] \r [orig_s5Wj ds_s5Wk ds1_s5Wl]
        case ds1_s5Wl of wild_s5Wm [Occ=Once] {
          Main.Bin y_s5Wn l_s5Wo [Occ=Once] r_s5Wp [Occ=Once] ->
              case
                  case ds_s5Wk of sat_s5Wq [Occ=Once] {
                    __DEFAULT -> dataToTag# [sat_s5Wq];
                  }
              of
              a'_s5Wr
              { __DEFAULT ->
                    case dataToTag# [y_s5Wn] of b'_s5Ws {
                      __DEFAULT ->
                          case <# [a'_s5Wr b'_s5Ws] of {
                            __DEFAULT ->
...

Here we take apart a Bin, and call dataToTag# on the contents; and because of the exprIsHNF stuff there is no guarantee that the argument to Bin points directly to the data value.

But in looking at this I found something else! In coment:6 there is no top-level CAF with a NOINLINE, so how do things go wrong. Here's how.

  • We start with
    thk = f ()
    g x = ...(case thk of v -> Bin v Tip Tip)...
    
    So far so good; the argument to Bin (which is strict) is evaluated.
  • Now we do float-out. And in doing so we do a reverse binder-swap (see Note [Binder-swap during float-out] in SetLevels) thus
    g x = ...(case thk of v -> Bin thk Nil Nil)...
    
    The goal of the reverse binder-swap is to allow more floating -- and it does! We float the Bin to top level:
    lvl = Bin thk Tip Tip
    g x = ...(case thk of v -> lvl))...
    
  • Now you can see that the argument of Bin, namely thk, points to the thunk, not to the value as it did before; and that gives rise to the bug.

Is this wrong? Not really. We are still guaranteed that the argument to Bin in lvl will be evaluated (by that case thk) before lvl is used. But we are no longer guaranteed that the argument to Bin points directly to the evaluated value.

So now I understand.

comment:37 Changed 2 months ago by simonpj

OK, moving on to solutions. Your patch seems OK. But I wonder if CorePrep is the right place to do it. After all, some other STG manipulation might break it again.

I think the right thing is for the code generator to do the job; that is, in effect implement dataToTag# properly. That is, in StgCmmPrim, in

cgOpApp (StgPrimOp primop) args res_ty = do

add a special case for DataToTagOp, when we are compiling dataToTag# x. Then behave exactly as if we'd seen case x of y -> dataToTag## y, where by dataToTag## y I mean generate the code the looks in the info table. (We have that code here

--  #define dataToTagzh(r,a)  r=(GET_TAG(((StgClosure *)a)->header.info))
--  Note: argument may be tagged!
emitPrimOp dflags [res] DataToTagOp [arg]
   = emitAssign (CmmLocal res) (getConstrTag dflags (cmmUntag dflags arg))

).

And by "behave exactly as if we'd see case ..." I roughly mean call StgCmmExpr.cgCase. But that need some alts which we don't conveniently have. The easiest thing would be to take -- the general case equation for cgCase and split off the bit that does the eval, so that we can call it from dataToTag#. Doing this is not trivial, but it feels like the Right Thing, and will remove the magic from dataToTag#.

comment:38 Changed 2 months ago by osa1

Right, fixing this in Cmm is also what I suggested in comment:24. I think two key questions are:

  • Can we optimise this? That is, can we know for certain that the argument of a dataToTag# points directly to a value (without going through blackholes or other indirections like IND_STATIC)?
  • Is this worth optimising?

Unless the answer to both of those is yes, then I think it makes sense to do this in Cmm (when compiling the primop). Then we can remove a bunch of notes about getTag and hacks in CorePrep, and simplify getTag to getTag = dataToTag#. I think we can also update the primop as can_fail = False.

It seems to me that Core is not the right place to answer (1). Even in STG I don't know if we can answer that question with certainty. I think it's doable in Cmm where we know about "lambda form" of ids (LambdaFormInfo) which tells us about how to enter an object. If "lambda form" of an argument is LFCon then we optimise, otherwise we enter.

comment:39 Changed 2 months ago by dfeuer

What I've been suggesting is that even when we don't statically know if it's been evaluated or tagged, we should surely be using tag bits when we find them. It looks like an oversight that we haven't done so in the past, and I foresee substantial benefits to fixing that. Perhaps we can win back more performance than we lose.

comment:40 Changed 2 months ago by osa1

Right, agreed that looking at tag bits would work for small types.

comment:41 Changed 2 months ago by simonpj

I agree with comment:38. Yes, the code generator is the right place to do it.

Yes, we could ooptimise for LFCon, but the simplifier will have done that already. The only case I think we could reliably optimise, that would not be done already, is

case x of y 
   A       -> blah
   DEFAULT -> ...(dataToTag# y)...

In this case we really do know that y points to the value. It would not be hard to let the code gen spot this; but I doubt it would happen much.

Right, agreed that looking at tag bits would work for small types.

This should happen automatically if we use the code for cgCase. It already has a fast-path for the case when the scrutinee is evaluated.

But you point is perhaps that for small types we don't need to index the info table: the tag is in the bits. Yes, that's a good idea.

comment:42 Changed 2 months ago by dfeuer

Ah, I see what you're saying. The 0 tag is handled by the case wrapper (or C-- equivalent). So for small types, we don't need any extra test; we can just use the tag bits directly. I suppose also that my attempt to use tag bits for unknown types is more trouble than it's worth; we should just assume that those have too many constructors.

As for the type-driven rewriting, I think we basically have two options:

  1. Do it in PrelRules. I think this requires something like the dataToTagWith# primop I mentioned earlier.
  1. Do it sometime later (tidy core, core prep, or lowering to STG). This lets us stick with dataToTag# throughout core2core, which is nice, but I don't know enough to know if any of those make sense or where/how to slot in the rule.

comment:43 Changed 2 months ago by osa1

Differential Rev(s): Phab:D5196Phab:D5196, Phab:D5201

As for the type-driven rewriting

This is not possible unless we somehow distinguish "stuff that directly points to a value (without going through indirections)" from other at the type level.


I submitted another diff that fixes this bug in Cmm. It also does a bunch of simplifications in other parts of the compiler (removes notes, special cases, and hacks for dataToTag#). Here are some example code we generate:

Program:

data T = T1 | T2

main = do
    print (I# (dataToTag# f))
    print (I# (dataToTag# a))
  where
    {-# NOINLINE f #-}
    f = T2
    {-# NOINLINE a #-}
    a = f

For the first dataToTag# we generate:

_s1yo::I64 = 1;   // CmmAssign

for the second

    // ======= DATA TO TAG ========
    Hp = Hp - 16;   // CmmAssign
    I64[Sp - 24] = c1yE;   // CmmStore
    R1 = a_r1lF_closure;   // CmmAssign
    Sp = Sp - 24;   // CmmAssign
    if (R1 & 7 != 0) goto c1yE; else goto c1yF;   // CmmCondBranch
c1yF: // global
    call (I64[R1])(R1) returns to c1yE, args: 8, res: 8, upd: 24;   // CmmCall, this is where we evalaute the arg
c1yE: // global
    _c1yD::I64 = R1;   // CmmAssign
    // ======= DATA TO TAG SMALL FAMILY ========
    _s1yr::I64 = _c1yD::I64 & 7 - 1;   // CmmAssign, read the tag bits

If I make this type a "big family", then we generate (for the second dataToTag#)

    // ======= DATA TO TAG ========
    Hp = Hp - 16;   // CmmAssign
    I64[Sp - 24] = c1A0;   // CmmStore
    R1 = a_r1mL_closure;   // CmmAssign
    Sp = Sp - 24;   // CmmAssign
    if (R1 & 7 != 0) goto c1A0; else goto c1A1;   // CmmCondBranch
c1A1: // global
    call (I64[R1])(R1) returns to c1A0, args: 8, res: 8, upd: 24;   // CmmCall, enter the argument
c1A0: // global
    _c1zZ::I64 = R1;   // CmmAssign
    // ======= DATA TO TAG GENERAL CASE ========
    _s1zN::I64 = %MO_UU_Conv_W32_W64(I32[I64[_c1zZ::I64 & (-8)] - 4]);   // CmmAssign, read info table

The patch is not ready (I'll need to update some notes, I also left some TODOs and questions in the code) but it fixes the bug and demonstrates that this is possible (and even easy) to do in Cmm.

comment:44 Changed 2 months ago by osa1

So we mention this "optimisation" without describing what it is, and I realized that that may cause some confusion so let's define it.

By "optimisation" I mean that in some cases when compiling dataToTag# we can avoid entering the argument, because we already know enough about it.

  • In compile time we sometimes know that the argument is definitely some constructor C. For example, in the example in comment:43, we know that f is T2, so we should be able to compile dataToTag# f to 1#. Another example is case foo of X -> ... dataToTag# foo ....

Note that we currently (with GHC 8.6 and HEAD) don't do this for f in comment:43, even with -O2. However with Phab:D5201 f in comment:43 is optimised to 1# in Cmm level. The case expression is optimised in all versions. (both diffs preserve Core transformations and rules, so anything optimised at Core level by GHC HEAD will still be optimised)

(I don't know why dataToTag# f in comment:43 is not optimised in Core level, but perhaps that can be tracked in another ticket)

  • In compile time we sometimes know that a value is already evaluated, but we don't know specifics (i.e. which constructor it is). For example, in a pattern match, a variable binding a strict field should be evaluated. Another example is let !foo = f x in ... dataToTag# foo .... In these cases we should be able to avoid entering the argument.

There are a few problems with this:

First, as #14677 demonstrates, we sometimes fail to tag values properly. So for example if a strict field's type is small, we still can't look at tag bits, we need to read the info table.

Second, the analysis for this should be more strict than the current exprIsHNF or isEvaldUnfolding (see Note [Always introduce case around dataToTag# arg] in comment:30 for what goes wrong otherwise). However, we currently don't have an analysis for this, and I don't know how easy it'd be to implement in Core (remember that we don't know about CAFs in Core until CorePrep, and this information is necessary for this analysis). I don't know if going into the trouble of implementing a new analysis in Core just for this purpose is worth it.

  • For all other cases we need to enter the argument in runtime. If the argument is a small type then we can look at the tag bits, otherwise we read the info table.

In other words, case (1) is when we know the constructor of the argument in compile time, case (2) is when we don't know the constructor, but we know that the argument is already evaluated, case (3) is when we don't know anything about the argument.

Looking at the tag bits in (3) is currently only done by Phab:D5201, but it could be easily added to other implementations.

(2) is currently only done by GHC 8.6 (and HEAD), but it's done incorrectly (thus this ticket). I think we should avoid this at least for now.

(1) Is done best in Phab:D5201, because we check lambda form of the argument in Cmm, which optimises some code that escapes Core optimisations. However, I think we should be able to improve simplifier somehow to optimise dataToTag# f in comment:43. We can track this in another ticket.

comment:45 Changed 2 months ago by osa1

Here's my (hopefully final) proposal:

(1) is best done in Core. I'll remove the relevant code from Phab:D5201. For cases that should be optimised but currently aren't, we can open a new ticket and track the progress there.

(2) is best avoided. It opens a can of worms, and it's what caused this ticket.

(3) Should be done in Cmm. Phab:D5201 already does this, but we need to do some refactoring to be able to reliably check whether the argument type is a large or small family (to decide whether to check tag bits of read the info table).

comment:46 Changed 2 months ago by simonpj

Comments about Phab:D5201. I'm not looking at the details yet (it's just a draft), but

  • I think this is the Right Place to deal with dataToTag#. Bravo.
  • The known-constructor case surely will be handled in the simplifier; if not now then soon. Handling it here is not wrong, but probably unnecesssary.
  • The thing that we CAN ONLY handle here is
    ...(case x of y {
            A -> blah
            DEFAULT -> ....(dataToTag# y)...
    
    Here we know that y really points to the value, so dastaToTag# does not need to do a redundant eval.

However I have just realised that this optimisation is available for any case expression, not just dataToTag#. Consider

...(case x of y {
        A -> blah
        DEFAULT -> ....(case y of
                          B -> blah2
                          C -> blah3
                       )...

Here we know that y is fully evaluated and points to the final value; that's the promise of the outer case expression. So we can directly test y's tag bits without worrying that it might be unevaluated, need to build a return point and info table etc. Ha!

I'm not sure how common this is. But perhaps it is worth a separate ticket.

comment:47 Changed 2 months ago by simonpj

In compile time we sometimes know that a value is already evaluated, but we don't know specifics

This ticket has convinced me that the property that "y is a correctly tagged pointer directly to an evaluated value" is extremely delicate. The only time we are really sure of this is in the case-binder of a case expression:

case e of y <alts>

In <alts> we know that y really is a tagged pointer and points to the value. I used to think that this was also true of the strict fields of a data constructor, but not so! See comment:36.

Moreover, as comment:36 shows, the Simplifier (for good reasons) does not guaranteed to maintain the Delicate Property, even it if it holds at some point. Only the code generator knows for sure. Lets not attempt to do this in Core.

comment:48 Changed 2 months ago by osa1

Great, I think we're on the same page. I'm finalizing Phab:D5201. Currently stuck on a let/app invariant invalidation (Core lint error) when I mark dataToTag# as can't fail (can_fail = False).

comment:49 Changed 2 months ago by osa1

Simon, what do you think about marking dataToTag# as can't fail? I thought it makes sense, but when I do that I get this lint error: (add type annotation on scrutinee)

<no location info>: warning:
    In the expression: tagToEnum#
                         @ Bool
                         (case (ds :: Instr) of lwild {
                            __DEFAULT -> 0#;
                            C_ALabel_1 ipv -> 1#
                          })
    This argument does not satisfy the let/app invariant:
      case ds of lwild {
        __DEFAULT -> 0#;
        C_ALabel_1 ipv -> 1#
      }

Note that the expression does not actually have dataToTag#. Apparently making dataToTag# can't fail enables some transformations, which leads to this.

(Type of ds is a lifted sum type named Instr)

If I don't make it "can't fail", I get this expression instead

              case GHC.Prim.dataToTag# @ Instr ds of b# { __DEFAULT ->
              GHC.Prim.tagToEnum#
                @ Bool
                (case b# of {
                   __DEFAULT -> 0#;
                   0# -> 1#
                 })
              };

The version with "can't fail" is better because we eliminate a redundant dataToTag# call, but apparently the resulting expression is not "OK for speculation".

I think this expression is not OK for speculation because the scrutinee is lifted, but I'm not sure.

Last edited 2 months ago by osa1 (previous) (diff)

comment:50 Changed 2 months ago by simonpj

Simon, what do you think about marking dataToTag# as can't fail?

It makes sense to me to do so -- but it's an unforced change. It's not so marked today, primops.txt.pp even says why

{- Note [dataToTag#]
~~~~~~~~~~~~~~~~~~~~
The dataToTag# primop should always be applied to an evaluated argument.
The way to ensure this is to invoke it via the 'getTag' wrapper in GHC.Base:
   getTag :: a -> Int#
   getTag !x = dataToTag# x

But now consider
    \z. case x of y -> let v = dataToTag# y in ...

To improve floating, the FloatOut pass (deliberately) does a
binder-swap on the case, to give
    \z. case x of y -> let v = dataToTag# x in ...

Now FloatOut might float that v-binding outside the \z.  But that is
bad because that might mean x gets evaluated much too early!  (CorePrep
adds an eval to a dataToTag# call, to ensure that the argument really is
evaluated; see CorePrep Note [dataToTag magic].)

Solution: make DataToTag into a can_fail primop.  That will stop it floating
(see Note [PrimOp can_fail and has_side_effects] in PrimOp).  It's a bit of
a hack but never mind.

Are we good to go if you don't make this change?

comment:51 Changed 2 months ago by simonpj

In compile time we sometimes know that the argument is definitely some constructor C

The simplifier does optimise this:

data T = T1 | T2 | T3 | T4 Char

boo = T4 'x'

f x = case x of
       T1 -> getTag x
       T2 -> getTag boo
       _  -> getTag x

produces -ddump-simpl

f :: T -> GHC.Prim.Int#
f = \ (x_aXJ :: T) ->
      case x_aXJ of wild_Xf {
        __DEFAULT -> GHC.Prim.dataToTag# @ T wild_Xf;
        T1 -> 0#;
        T2 -> 3#
      }

comment:52 Changed 2 months ago by osa1

I understand why it's currently marked as can_fail, but if my understanding is correct we should be able to mark it as can_fail = False now. I'm curious why that's causing problems.

Also, as shown in comment:48, the code is actually better with can_fail = False.

Are we good to go if you don't make this change?

I tried to validate now without can_fail = False. I'm getting a segfault in haddock (when building docs during validate), then in the test suite haddock perf tests are failing. Other tests are passing. I'll debug.

comment:53 Changed 2 months ago by simonpj

if my understanding is correct we should be able to mark it as can_fail = False now

Why do you think the Note is now not relevant? It looks every bit as pertinent now as before.

comment:54 Changed 2 months ago by osa1

Well because this transformation

\z. case x of y -> let v = dataToTag# y in ...
==>
\z. case x of y -> let v = dataToTag# x in ...

is not a problem anymore. Although I now realize that floating v outsize of \z is still a problem, because that'd mean evaluating x more eagerly.

So I think first part of the note isn't relevant anymore, but second part (floating out) is.

Last edited 2 months ago by osa1 (previous) (diff)

comment:55 Changed 2 months ago by osa1

I did a little bit of debugging for the segfault problem. I'm wondering if I'm triggering some other bug, because the segfault is happening during GC:

>>> bt
#0  evacuate1 (p=p@entry=0x420fe54720) at rts/sm/Evac.c:642
#1  0x00007fdcea628c35 in scavenge_block1 (bd=0x420fe01500) at rts/sm/Scav.c:541
#2  0x00007fdcea648f5f in scavenge_find_work () at rts/sm/Scav.c:2067
#3  scavenge_loop1 () at rts/sm/Scav.c:2130
#4  0x00007fdcea6502e2 in scavenge_until_all_done () at rts/sm/GC.c:1090
#5  0x00007fdcea650c3f in GarbageCollect (collect_gen=collect_gen@entry=1, do_heap_census=do_heap_census@entry=false, gc_type=gc_type@entry=2, cap=cap@entry=0x7fdcea686ac0 <MainCapability>, idle_cap=idle_cap@entry=0x1f14dd0) at rts/sm/GC.c:421
#6  0x00007fdcea635f17 in scheduleDoGC (pcap=pcap@entry=0x7ffdb2c2fea0, task=task@entry=0x1f109b0, force_major=force_major@entry=false) at rts/Schedule.c:1798
#7  0x00007fdcea6368ec in schedule (initialCapability=initialCapability@entry=0x7fdcea686ac0 <MainCapability>, task=task@entry=0x1f109b0) at
rts/Schedule.c:546
#8  0x00007fdcea637e41 in scheduleWaitThread (tso=0x4200006388, ret=ret@entry=0x0, pcap=pcap@entry=0x7ffdb2c2ff38) at rts/Schedule.c:2537
#9  0x00007fdcea642a98 in rts_evalLazyIO (cap=cap@entry=0x7ffdb2c2ff38, p=p@entry=0x756150, ret=ret@entry=0x0) at rts/RtsAPI.c:530
#10 0x00007fdcea6433ec in hs_main (argc=<optimized out>, argv=<optimized out>, main_closure=0x756150, rts_config=...) at rts/RtsMain.c:72
#11 0x0000000000749560 in main ()

The info table pointer is wrong:

>>> print info
$1 = (const StgInfoTable *) 0x72

comment:56 Changed 2 months ago by osa1

Interestingly, if I revert the Cmm bits and always introduce a case on dataToTag# args in CorePrep I can validate. Patch is here. I'll compare the differences in generated code in this branch and Phab:D5201.

comment:57 Changed 2 months ago by osa1

Phab:D5201 currently validates and is ready for reviews.

comment:58 Changed 2 months ago by simonpj

I'm confused. You report a set-fault in comment:55; but now you say it's ready for review. How did you fix it? What is the new design? It's hard to review the patch without understanding the thinking. Thanks!

comment:59 Changed 2 months ago by osa1

Sorry it's a bit hard to follow -- I asked Simon Marlow about the segfault, and he correctly guessed that the problem is that primops are assumed to be non-allocating and I have to update StgCmmExpr.isSimpleOp. This change is included in the diff and it fixes the segfault.

comment:60 Changed 2 months ago by simonpj

The dataToTag# saga has revealed several things, which I'd like to summarise here.

  1. We have been guilty of mixing up two things:
    • A semantic property, namely that an expression, or variable, is guaranteed to be non-bottom. For example, given
      data T = MkT !Int
      f (MkT y) = ...y...
      
      in the body of f, we know that y is non-bottom.
    • A representational propery, that a variable is bound to a heap pointer that (a) points directly to an evaluated heap-allocated value (e.g. a cons cell), and (b) is properly tagged, for data constructors with small enough tags.

The representational property implies the semantic property, but not vice versa. For example, in the above code, are we guaranteed that y (being the strict field of MkT is bound to a properly-tagged pointer to an Int? I used to think yes, but this example (look at the bullets in that comment) shows that the answer is NO.

My conclusion: we should keep the two properties separate. The Simplifier can maintain the first; but the second is much more fragile, and is the business of the code generator alone.

  1. We have never implemented #14626, but it's pretty easy to do so provided we focus our attention on the code generator. The example there is
    f x = case x of y
               Red -> Green
               DEFAULT -> y
    
    The resturn contract of case specifies that y is bound to a properly-tagged pointer directly to the value; The code generator can remember this fact, in its environment; and simply return y in the DEFAULT branch rather than entering it.
  1. exprOkForSpeculation finishes with this case (in app_ok)
          _other -> isUnliftedType (idType fun)          -- c.f. the Var case of exprIsHNF
                 || idArity fun > n_val_args             -- Partial apps
                 || (n_val_args == 0 &&
                     isEvaldUnfolding (idUnfolding fun)) -- Let-bound values
    
    I think the final disjunct is wrong, becuase it is fragile. Consider
      case x of y { DEFAULT -> ...y.... }
    
    In that branch, is y ok-for-speculation? It looks so, because it has an evald unfolding (OtherCon). But the binder-swap transformation performed by FloatOut transforms it to
      case x of y { DEFAULT -> ...x.... }
    
    and x is NOT ok-for-spec. So the binder-swap transformation might invalidate invariants.

This is pretty obscure in practice, but I think we can make the compiler simpler and more correct by deleting those two lines. I think it will have zero effect because exprOkForSpecuation is seldom if ever called on expressions of lifted type.

  1. Only two primops (currently) evaluate a lifted-type argument: SeqOp and DataToTagOp. The former has a special case in app_ok (in exprOkForSpeculation and the latter should too.
  1. Once all this is done, the hack of making DataToTagOp have can_fail=True can be dispensed with.
  1. We have related loose ends in #14677 and #15155.
  1. We might want to review optimisation of dataToTag# (tagToEnum# e); see https://phabricator.haskell.org/D5201#143657
Last edited 2 months ago by simonpj (previous) (diff)

comment:61 Changed 2 months ago by dfeuer

I believe that spark# and par# also evaluate lifted-type arguments.

comment:62 Changed 2 months ago by osa1

comment:63 Changed 2 months ago by bgamari

We will move ahead merging Phab:D5201 to move ahead with 8.6.2, acknowledging that Phabe:D5201 is not the Final Answer.

Then Omer will continue to pursue the alternatives in comment:60.

Last edited 2 months ago by simonpj (previous) (diff)

comment:64 Changed 2 months ago by bgamari

Owner: set to osa1

Omer will investigate.

comment:65 Changed 2 months ago by simonpj

I believe that spark# and par# also evaluate lifted-type arguments.

They do have a lifted-type arg, but they don't evalauate it: they put it in a spark queue.

comment:66 Changed 2 months ago by Ömer Sinan Ağacan <omeragacan@…>

In ac977688/ghc:

Fix dataToTag# argument evaluation

See #15696 for more details. We now always enter dataToTag# argument (done in
generated Cmm, in StgCmmExpr). Any high-level optimisations on dataToTag#
applications are done by the simplifier. Looking at tag bits (instead of
reading the info table) for small types is left to another diff.

Incorrect test T14626 is removed. We no longer do this optimisation (see
comment:44, comment:45, comment:60).

Comments and notes about special cases around dataToTag# are removed. We no
longer have any special cases around it in Core.

Other changes related to evaluating primops (seq# and dataToTag#) will be
pursued in follow-up diffs.

Test Plan: Validates with three regression tests

Reviewers: simonpj, simonmar, hvr, bgamari, dfeuer

Reviewed By: simonmar

Subscribers: rwbarton, carter

GHC Trac Issues: #15696

Differential Revision: https://phabricator.haskell.org/D5201

comment:67 Changed 2 months ago by osa1

The bug fix is now merged. I'll be investigating the loose ends mentioned in comment:60.

comment:68 Changed 2 months ago by osa1

Simon, you said that we can remove can_fail=True for dataToTag# after removing various hacks (in comment:60), and I also used to think that this should be the case, but thinking about this more I think dataToTag# should stay as can_fail=True.

The reason is because dataToTag# evaluates the argument and returns an unboxed value, so I think these two expressions calls should have the same value:

exprOkForSpeculation `dataToTag# foo`
exprOkForSpeculation `case foo of x -> 1#` -- note: foo is lifted

Looking at the code I see that the latter returns False (because the scrutinee is lifted), so for the former to return false we need to mark dataToTag# as can_fail = True. Does this make sense?

comment:69 Changed 2 months ago by simonpj

It's tricky, I agree; and I agree that's the result that exprOkForSpeculation should return False in these cases. But as I say in item (6) of comment:60, app_ok already has a special case for SeqOp for this very reason, and I think we should just extend that to DataToTagOp.

In fact I got as far as writing the Note to accompany the change to app_ok:

Note [PrimOps that evaluate their arguments]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Most primops to not evaluate their arguments, even lifted
arguments.  But two do: DataToTagOp and SeqOp (rembember
the latter is monadic; it is not the primop corresponding
to 'seq').

Now, is (dataToTag# x) ok-for-speculation?  You might say "yes, if x
is ok-for-speculation", because then it'll obey the rules for
ok-for-speculation. But it's very fragile. Consider:

   \z. case x of y { DEFAULT -> let v = dataToTag# y in ... }

This looks OK: we ask if 'y' is ok-for-spec, and say yes
because it is evaluated.  But if we do the binder-swap operation
(which happens in FloatOut) we have

   \z. case x of y { DEFAULT -> let v = dataToTag# x in ... }

and now it is /not/ ok-for-spec.  This becomes even clearer if
we float it to give

  let v = dataToTag# x
  in \z. case x of y { DEFAULT -> ... }

Conclusion: always return False for ok-to-spec on
SeqOp and DataToTagOp.

Does that makes sense? The can-fail thing is a hack, and we don't do it for SeqOp.

comment:70 Changed 2 months ago by osa1

OK, that makes sense, thanks. Moving on, I'm now trying to understand an assertion in FloatIn which is triggered when I make the can_fail change and update ok_app so that dataToTag# is never OK for speculation.

The assertion that fails is this: (in FloatIn.hs)

fiExpr _ to_drop (_, AnnLit lit)     = ASSERT( null to_drop ) Lit lit

The definition that triggers it:

$wclosureTypeHeaderSize_s75y [InlPrag=NOUSERINLINE[2]]
  :: ClosureType -> Int#
[LclId, Arity=1, Str=<S,U>]
$wclosureTypeHeaderSize_s75y
  = \ (w_s75u :: ClosureType) ->
      case dataToTag# @ ClosureType w_s75u of a#_a4k2 [Dmd=<S,U>]
      { __DEFAULT ->
      join {
        $j_s6vr [Dmd=<L,U(U)>] :: Int#
        [LclId[JoinId(0)], Str=m]
        $j_s6vr
          = case w_s75u of lwild_s6yB {
              __DEFAULT ->
                case a#_a4k2 of a#_a4jU [Dmd=<L,A>] { __DEFAULT ->
                case w_s75u of lwild_s6yA {
                  __DEFAULT ->
                    case a#_a4k2 of a#_X4Ml [Dmd=<L,A>] { __DEFAULT ->
                    case w_s75u of lwild_s6vm {
                      __DEFAULT ->
                        case a#_a4k2 of a#_X4Mq [Dmd=<L,A>] { __DEFAULT -> 1# };
                      AP_STACK -> 2#
                    }
                    };
                  AP -> 2#
                }
                };
              THUNK_SELECTOR -> 2#
            } } in
      case <# a#_a4k2 15# of lwild_s6vx {
        __DEFAULT ->
          case a#_a4k2 of b#_a4k3 [Dmd=<S,U>] { __DEFAULT ->
          case <# 20# a#_a4k2 of lwild_s6vu {
            __DEFAULT -> 2#;
            1# -> jump $j_s6vr
          }
          };
        1# -> jump $j_s6vr
      }
      }

At one point fiExpr comes across this expression:

case a#_a4k2 :: Int# of a#_X4Mq [Dmd=<L,A>] { __DEFAULT -> 1# }

with an empty "to drop" list. At this point we try to float the case inside the alternative (happens in first AnnCase alternative of fiExpr), but body of the alternative is a literal so the assertion is triggered.

Questions:

  • What's wrong with seeing a literal when floating stuff inwards? Why not just introduce case/let around it?
  • There should be some code somewhere that's supposed to ensure that this assertion is not triggered, but I can't find it.
  • Why did simplifier not remove this case expression? It does no work (the scrutinee is unlifted and binder is not used) but it's somehow not eliminated by the simplifier pass right before FloatIn.

comment:71 Changed 2 months ago by simonpj

What's wrong with seeing a literal when floating stuff inwards? Why not just introduce case/let around it?

Because the float-in pass floats bindings inwards towards their use sites. If there is no use of the variable bound by a floating binding, then we should not be floating that binding into that part of the code. Literals are an extreme case.

There should be some code somewhere that's supposed to ensure that this assertion is not triggered, but I can't find it.

It's FloatIn.sepBindsByDropPoint.

Why did simplifier not remove this case expression?

That is indeed a mystery. Is the code you are showing the output of the simplifier?

Can you give instructions to reproduce (maybe make a wip/ branch)?

comment:72 Changed 2 months ago by osa1

That is indeed a mystery. Is the code you are showing the output of the simplifier?

The code I showed is the input of FloatIn when a file is compiled with -O. The full command to reproduce (after building stage 1):

"inplace/bin/ghc-stage1" -hisuf hi -osuf  o -hcsuf hc -static  -O0 -H64m -Wall      -this-unit-id ghc-heap-8.7 -hide-all-packages -i -ilibraries/ghc-heap/. -ilibraries/ghc-heap/dist-install/build -Ilibraries/ghc-heap/dist-install/build -ilibraries/ghc-heap/dist-install/build/./autogen -Ilibraries/ghc-heap/dist-install/build/./autogen -Ilibraries/ghc-heap/.    -optP-include -optPlibraries/ghc-heap/dist-install/build/./autogen/cabal_macros.h -package-id base-4.12.0.0 -package-id ghc-prim-0.5.3 -package-id rts -Wall -XHaskell2010 -XNoImplicitPrelude -O  -no-user-package-db -rtsopts  -Wno-deprecated-flags     -Wnoncanonical-monad-instances  -odir libraries/ghc-heap/dist-install/build -hidir libraries/ghc-heap/dist-install/build -stubdir libraries/ghc-heap/dist-install/build   -dynamic-too -c libraries/ghc-heap/./GHC/Exts/Heap/ClosureTypes.hs -o libraries/ghc-heap/dist-install/build/GHC/Exts/Heap/ClosureTypes.o -dyno libraries/ghc-heap/dist-install/build/GHC/Exts/Heap/ClosureTypes.dyn_o

Can you give instructions to reproduce (maybe make a wip/ branch)?

Committed code to wip/T15696. Note that -ddump-simpl-iterations is not printing Core right before FloatIn so you need some extra prints to be able to trace this. I'm currently using this patch for that.

comment:73 Changed 2 months ago by simonpj

I took a look

  1. We should give dataToTag# a strictness signatures in primops.txt.pp. After all, it's strict. If we do that then,
      case x of y { DEFAULT -> daataToTag# y }
    
    will correctly optimise to dataToTag# x.
  1. We should remove the bang from GHC.Base.getTag. Currenlty it says
    getTag !x = dataToTag# x
    
    But now the bang is unnecessary, and has to be optimised away by (1).
  1. That's also the source of the extra stuff like
      case a#_a4k2 :: Int# of a#_X4Mq [Dmd=<L,A>] { __DEFAULT -> ... }
    
    which you report in comment:70. This is not the output of the simplifier; since that simplifier run we have done (a) exitification, (b) float-out and (c) CSE. The last of these left those redudant cases; they would, I believe, disappear in the next simplifier run (if only we reached it). But after (1) and (2) they won't even be there in the first place.
  1. I see why the ASSERT trips. In the input to float-in we have
      case a#_a4k2 :: Int# of a#_X4Mq { __DEFAULT -> 1# }
    
    So float-in picks up that case-expression and pushes it inwards. But it immediately hits the literal. The ASSERT wasn't expecting to have bindings that were dead in the input; but here is exactly such a binding, and there's nothign actually wrong with it. The same ASSERT failure would happen with
      let x = e in 1#
    
    Of course the simplifer would usually eliminate such a thing, but there is no reason for float-in to require that. Just change
    fiExpr _ to_drop (_, AnnLit lit)     = ASSERT( null to_drop ) Lit lit
    
    to
    fiExpr _ to_drop (_, AnnLit lit)     = wrapFloats to_drop (Lit lit)
         --  See Note [Dead bindings]
    
    {- Note dead bindings
    ~~~~~~~~~~~~~~~~~~~~~
    At a literal we won't usually have any floated bindings; the
    only way that can happen is if the binding wrapped the literal
    /in the original input program/.  e.g.
       case x of { DEFAULT -> 1# }
    But, while this may be unusual it is not actually wrong, and it did
    once happen (Trac #15696).
    }}
    

comment:74 Changed 2 months ago by osa1

Thanks Simon,

Re 1, the primop is already marked as strict:

   strictness = { \ _arity -> mkClosedStrictSig [evalDmd] topRes }

Re 2, I already removed the bang pattern in Phab:D5201. I think you need a git pull.

Re 3, both (1) and (2) are already done so this should've worked?

I'll implement 4 and update.

comment:75 Changed 2 months ago by simonpj

Sorry, yes, I was looking in the wrong tree re (1) and (2).

Re (3) I now understand. The output of the simplifier is this

        $j_s6vr
          = case w_s75u of lwild_s6yB {
              __DEFAULT ->
                case GHC.Prim.dataToTag# @ ClosureType lwild_s6yB of { __DEFAULT ->
                case lwild_s6yB of lwild_s6yA {
                  __DEFAULT ->
                    case GHC.Prim.dataToTag# @ ClosureType lwild_s6yA of { __DEFAULT ->
                    case lwild_s6yA of lwild_s6vm {
                      __DEFAULT ->
                        case GHC.Prim.dataToTag# @ ClosureType lwild_s6vm of { __DEFAULT ->
                        1#
                        };
                      AP_STACK -> 2#
                    }
                    };
                  AP -> 2#
                }
                };
              THUNK_SELECTOR -> 2#
            } } in

Are those case lwild_s6yB of lwild_s6yq { ...} evals actually redundant? No: they are checking for AP_STACK and THUNK_SELECTOR resp.

But the one you originally asked about was

  case a#_a4k2 :: Int# of a#_X4Mq { __DEFAULT -> 1# }

This one was introduced by CSE, so it has not yet had a simplifer run to eliminate it. Before CSE it looked like

  case dataToTag# lwild of a#_X4Mq { __DEFAULT -> 1# }

In short, all is well. Just do (4).

comment:76 Changed 2 months ago by osa1

I did (4) and tried to validate. There are 4 regressions. 2 of them are perf, the other 2 are tests with checks on -ddump-simpl output. They're all basically the same problem: worker/wrapper results change slightly after the patch, and worker functions take more boxed params. Here's an example (T10482). Original program is this:

{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE TypeFamilies #-}
module T10482 where

data family Foo a
data instance Foo (a, b) = FooPair !(Foo a) !(Foo b)
newtype instance Foo Int = Foo Int

foo :: Foo ((Int, Int), Int) -> Int -> Int
foo !f k =
  if k == 0 then 0
  else if even k then foo f (k-1)
  else case f of
    FooPair (FooPair (Foo n) _) _ -> n

$wfoo originally:

Rec {
-- RHS size: {terms: 19, types: 4, coercions: 0, joins: 0/0}
T10482.$wfoo [InlPrag=NOUSERINLINE[2], Occ=LoopBreaker]
  :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int#
[GblId,
 Arity=2,
 Caf=NoCafRefs,
 Str=<L,1*U><S,1*U>,
 Unf=OtherCon []]
T10482.$wfoo
  = \ (ww_s2Qy :: GHC.Prim.Int#) (ww1_s2QG :: GHC.Prim.Int#) ->
      case ww1_s2QG of wild_X1v {
        __DEFAULT ->
          case GHC.Prim.remInt# wild_X1v 2# of {
            __DEFAULT -> ww_s2Qy;
            0# -> T10482.$wfoo ww_s2Qy (GHC.Prim.-# wild_X1v 1#)
          };
        0# -> 0#
      }
end Rec }

after the patch (wip/T15696):

Rec {
-- RHS size: {terms: 22, types: 7, coercions: 3, joins: 0/0}
T10482.$wfoo [InlPrag=NOUSERINLINE[2], Occ=LoopBreaker]
  :: Foo Int -> GHC.Prim.Int# -> GHC.Prim.Int#
[GblId,
 Arity=2,
 Caf=NoCafRefs,
 Str=<L,1*U(1*U)><S,1*U>,
 Unf=OtherCon []]
T10482.$wfoo
  = \ (ww_s2Qs
         :: Foo Int
         Unf=OtherCon [])
      (ww1_s2Qz :: GHC.Prim.Int#) ->
      case ww1_s2Qz of wild_X1l {
        __DEFAULT ->
          case GHC.Prim.remInt# wild_X1l 2# of {
            __DEFAULT ->
              case ww_s2Qs
                   `cast` (T10482.D:R:FooInt0[0] ; T10482.N:R:FooInt[0]
                           :: Foo Int ~R# Int)
              of
              { GHC.Types.I# ww3_s2QD ->
              ww3_s2QD
              };
            0# -> T10482.$wfoo ww_s2Qs (GHC.Prim.-# wild_X1l 1#)
          };
        0# -> 0#
      }
end Rec }

The difference is that the first argument is now boxed Int instead of Int# as before.

I think the reason for this change is this: originally evaluation of ww_s2Qs happens after remInt# wild_X1l 2#, but if we pass ww_s2Qs unboxed that'd mean evaluating it before remInt#. This is possible if at least one of those expressions are OK-for-speculation. remInt# is not OK-for-speculation because it can fail. ww_s2Qs was previously (without wip/T15696) OK-for-speculation, but we just made app_ok more strict by removing a disjunct which was actually holding for this expression:

  • n_val_args == 0 holds because it has no args
  • isEvaldUnfolding (idUnfolding fun) apparently also somehow holds. I don't know how exactly, but I'm guessing that because of the strictness annotation on the data con field we give this id evaldUnfolding.

Other 3 regressions are also of similar nature (worker function with less unboxing).

What to do? I think it's correct to consider the data con field as strict here and pass an unboxed value to the worker. While we may not actually have a tagged value there (or even an evaluated one! e.g. #14677, #15155), the programmer indicated that it's fine to evaluate the field eagerly. As long as we don't expect (in the generated code) evaluated or tagged value it's fine.

So perhaps we should not do the change in ok_app.

comment:77 Changed 2 months ago by simonpj

The rabbit hole gets deeper. In T10482 We have a data constructor FooPair that is strict in both arguments, and an expression like

foo f k
 = case f of
     FooPair x y -> case burble of
                      True -> case x of
                                FooPair p q -> ...
                      False -> ...

Previously we floated that inner case x out to get

foo f k
 = case f of
     FooPair x y -> case x of
                      FooPair p q -> case burble of
                                       True -> ...
                                       False -> ...

which is nice and strict.

So firstly: the reasoning in item (3) of comment:60 is right for case-binders, but not for the binders of a constructor pattern (the binder-swap stuff doesn't apply to them). So rather than make the change in item (3), we could instead just refrain from giving an "evald-unfolding" to the case binder. This happens here, in Simplify.simplAlts

        ; (env1, case_bndr1) <- simplBinder env0 case_bndr
        ; let case_bndr2 = case_bndr1 `setIdUnfolding` evaldUnfolding
              env2       = modifyInScope env1 case_bndr2
              -- See Note [Case binder evaluated-ness]

We could try (a) undoing the change in item (3), and (b) removing the above meddling with case_bndr. NB: Note [Case binder evaluated-ness] is, I believe, out of date; we now skip the lifted args of primpos in app_ok.

Secondly I think we can improve w/w, even for the case where that case-expression is not floated out. It's all to do with Note [Add demands for strict constructors] in DmdAnal, and the function addDataConStrictness. This special treatment is ineffective here because (in the first code for foo above), x is really used lazily in the alternative. And yet it'd be sound to unpack it.

So here's an idea:

  • Delete all the stuff about addDataConStrictness from DmdAnal.
  • Instead, add it into WwLib thus:
      | isStrictDmd dmd
      , Just cs <- splitProdDmd_maybe dmd
          -- See Note [Unpacking arguments with product and polymorphic demands]
      , not (has_inlineable_prag && isClassPred arg_ty)
          -- See Note [Do not unpack class dictionaries]
      , Just (data_con, inst_tys, inst_con_arg_tys, co)
                 <- deepSplitProductType_maybe fam_envs arg_ty
      , cs `equalLength` inst_con_arg_tys
          -- See Note [mkWWstr and unsafeCoerce]
      = do { (uniq1:uniqs) <- getUniquesM
            ; let   cs'       = addDataConStrictness data_con cs    <---------------- NEW 
                    unpk_args = zipWith3 mk_ww_arg uniqs inst_con_arg_tys cs'
                    unbox_fn  = mkUnpackCase (Var arg) co uniq1
                                             data_con unpk_args
                    arg_no_unf = zapStableUnfolding arg
    
    That is, add one new line, and transfer the defn of addDataConStrictness from DmdAnal.

This actually works, even without implementing "Firstly" -- I tried it.

And thirdly, I noticed that addDataConStrictness makes the demand stricter like this

    add dmd str | isMarkedStrict str
                , not (isAbsDmd dmd) = dmd `bothDmd` seqDmd
                | otherwise          = dmd

But bothDmd seqDmd messes up the cardinality information! I doubt this is important but better to define (in Demand)

strictifyDmd :: Demand -> Demand
strictifyDmd dmd@(JD { sd = str })
  = dmd { sd = str `bothArgStr` Str VanStr HeadStr }

and call it from addDataConStrictness.

I think either Firstly or Secondly should fix the regressions, but all three should be good.

comment:78 Changed 2 months ago by osa1

I'm probably missing the big picture, but I think it makes sense to give evaluated things evaldUnfolding and if that breaks something else then we should fix that something else. In (3) of comment:60, perhaps the thing to fix is the binder-swapping transform rather than the id unfolding, so that it avoids swapping binders with different evaluated-ness (or maybe only avoid it if doing so will break the let/app invariant -- not sure how hard would it be to check this though).

I don't know enough about demand analysis to comment on Secondly. I'll try it.

Re Thirdly, the demand type does not make too much sense to me (how can an id have strict demand but unused cardinality? Sounds like a hack to make some transformations possible) but I can see how the current code changes cardinality incorrectly. I'll update.

comment:79 Changed 2 months ago by osa1

Submitted Phab:D5225 for addDataConStrictness fix (thirdly above).

Last edited 2 months ago by bgamari (previous) (diff)

comment:80 Changed 2 months ago by bgamari

Differential Rev(s): Phab:D5196, Phab:D5201Phab:D5196, Phab:D5201, Phab:D5226

I gave Secondly from comment:77 a try while waiting for GHC 8.4.4 builds. The result is Phab:D5226.

However, the approach presented does seem a bit odd since the demand resulting from demand analysis is still conservative; we merely make up for this conservatism in worker-wrapper. However, this doesn't help other (hypothetical) consumers of demand analysis. Perhaps it would be good to open a ticket to ensure we don't forget about this limitation.

comment:81 Changed 2 months ago by simonpj

We decided not to attempt to put Phab:D5226 on GHC 8.6. It's not necessary to fix 8.6.

Instead Omer will include it on his wip/T15696 branch

comment:82 Changed 8 weeks ago by osa1

Just an update before I forget where I had left: the performance regression in wip/T15696 is fixed by any of these:

  • Revert change in app_ok general case, don't give case binders evaldUnfolding in simplAlts (first suggestion in comment:77)
  • Revert change in app_ok general case, disable binder swapping of case bidners and scrutinee.

Both validate, but I don't have nofib results yet.

comment:83 Changed 8 weeks ago by simonpj

Revert change in app_ok general case, don't give case binders evaldUnfolding in simplAlts (first suggestion in comment:77)

This is my preferred option; I don't like disabling binder swapping - it's there for a good reason!

It would be illuminating to know (perhaps via -ticky) what code is improved by reverting the app_ok general case. If we knew, we could add an example to the code so that we had a concrete reason for that isEvaldUnfolding case. But it's only curiosity.

comment:84 Changed 8 weeks ago by osa1

I don't like disabling binder swapping - it's there for a good reason!

I don't feel strongly about this, but just out of curiosity I compared nofib outputs of GHC HEAD with and without this patch: (disables swapping case binder swapping)

diff --git a/compiler/simplCore/SetLevels.hs b/compiler/simplCore/SetLevels.hs
index b8212c72f3..8fabf98179 100644
--- a/compiler/simplCore/SetLevels.hs
+++ b/compiler/simplCore/SetLevels.hs
@@ -469,15 +469,15 @@ lvlCase env scrut_fvs scrut' case_bndr ty alts
         -- Always float the case if possible
         -- Unlike lets we don't insist that it escapes a value lambda
     do { (env1, (case_bndr' : bs')) <- cloneCaseBndrs env dest_lvl (case_bndr : bs)
-       ; let rhs_env = extendCaseBndrEnv env1 case_bndr scrut'
-       ; body' <- lvlMFE rhs_env True body
+       -- ; let rhs_env = extendCaseBndrEnv env1 case_bndr scrut'
+       ; body' <- lvlMFE env1 True body
        ; let alt' = (con, map (stayPut dest_lvl) bs', body')
        ; return (Case scrut' (TB case_bndr' (FloatMe dest_lvl)) ty' [alt']) }

   | otherwise     -- Stays put
   = do { let (alts_env1, [case_bndr']) = substAndLvlBndrs NonRecursive env incd_lvl [case_bndr]
-             alts_env = extendCaseBndrEnv alts_env1 case_bndr scrut'
-       ; alts' <- mapM (lvl_alt alts_env) alts
+             -- alts_env = extendCaseBndrEnv alts_env1 case_bndr scrut'
+       ; alts' <- mapM (lvl_alt alts_env1) alts
        ; return (Case scrut' case_bndr' ty' alts') }
   where
     ty' = substTy (le_subst env) ty
@@ -1496,6 +1496,7 @@ floatTopLvlOnly le = floatToTopLevelOnly (le_switches le)
 incMinorLvlFrom :: LevelEnv -> Level
 incMinorLvlFrom env = incMinorLvl (le_ctxt_lvl env)

+{-
 -- extendCaseBndrEnv adds the mapping case-bndr->scrut-var if it can
 -- See Note [Binder-swap during float-out]
 extendCaseBndrEnv :: LevelEnv
@@ -1507,6 +1508,7 @@ extendCaseBndrEnv le@(LE { le_subst = subst, le_env = id_env })
   = le { le_subst   = extendSubstWithVar subst case_bndr scrut_var
        , le_env     = add_id id_env (case_bndr, scrut_var) }
 extendCaseBndrEnv env _ _ = env
+-}

 -- See Note [Join ceiling]
 placeJoinCeiling :: LevelEnv -> LevelEnv

There's one -0.3% allocation in one program ("pic") with this patch, the rest is the same. No change in binary sizes.

So perhaps this is not as important as we thought.

It would be illuminating to know (perhaps via -ticky) what code is improved by reverting the app_ok general case

Have you seen comment:76? I show one example there, showing Core with and without the change in app_ok.

comment:85 Changed 8 weeks ago by osa1

Making progress. We want to handle dataToTag# and seq# uniformly, as they're two exceptional primops that evaluate their arguments. So I added this to app_ok:

app_ok primop_ok fun args
  = case idDetails fun of
        ...
        | SeqOp <- op ------------------- OLD CODE
        -> all (expr_ok primop_ok) args

        | DataToTagOp <- op ------------- NEW CODE
        -> all (expr_ok primop_ok) args

However this causes let/app invariant errors. Here's an example:

$cpred_a3m9
  = \ (a_a2Ix :: VecElem) ->
      case dataToTag# @ VecElem a_a2Ix of a#_a2Iy { __DEFAULT ->
      case eqInt (GHC.Types.I# 0#) (GHC.Types.I# a#_a2Iy) of {
        False -> tagToEnum# @ VecElem (+# a#_a2Iy -1#);
        True ->
          error
            @ 'LiftedRep
            @ VecElem
            ($dIP_s4eG
             `cast` (Sym (GHC.Classes.N:IP[0]
                              <"callStack">_N <GHC.Stack.Types.CallStack>_N)
                     :: GHC.Stack.Types.CallStack
                        ~R# (?callStack::GHC.Stack.Types.CallStack)))
            (build
               @ Char
               (\ (@ b_a44N) ->
                  unpackFoldrCString#
                    @ b_a44N
                    "pred{VecElem}: tried to take `pred' of first tag in enumeration"#))
      }
      }

This is OK-for-spec, but FloatOut first transforms this into:

$cpred_a3m9 :: VecElem -> VecElem
[LclId, Arity=1]
$cpred_a3m9
  = \ (a_a2Ix :: VecElem) ->
      case dataToTag# @ VecElem a_a2Ix of a#_a2Iy { __DEFAULT ->
      case eqInt lvl_s4yV (I# a#_a2Iy) of {
        False -> tagToEnum# @ VecElem (+# a#_a2Iy -1#);
        True -> lvl_s4yY
      }
      },

then simplifier:

$cpred_a3m9
  = \ (a_a2Ix :: VecElem) ->
      case a_a2Ix of lwild_s50A [Dmd=<S,U>] {
        __DEFAULT ->
          tagToEnum# @ VecElem (+# (dataToTag# @ VecElem a_a2Ix) -1#);
        Int8ElemRep -> lvl_s4yY
      }

which is not OK-for-spec beucase a_a2Ix doesn't have evaldUnfolding. The error message:

*** Core Lint errors : in result of Float out(FOS {Lam = Just 0,
                                                   Consts = True,
                                                   OverSatApps = True}) ***
<no location info>: warning:
    In the expression: +# (dataToTag# @ VecElem a_a2Ix) -1#
    This argument does not satisfy the let/app invariant:
      dataToTag# @ VecElem a_a2Ix

I don't understand why we substitute dataToTag# @ VecElem a_a2Ix for the case binder. Simon, any ideas?

(Committed the code to wip/T15696, the error happens during validate)

comment:86 Changed 7 weeks ago by asr

Cc: asr added

comment:87 Changed 6 weeks ago by bgamari

The core problem here was fixed with comment:66.

There are several additional tasks that can be done, however, as described in comment:77.

comment:88 Changed 4 weeks ago by simonpj

Simon will look at the lint error on wip/T15696

comment:89 Changed 6 days ago by Simon Peyton Jones <simonpj@…>

In f334d20e/ghc:

Careful tweaking to exprOkForSpeculation

This patch does several things:

* Make exprOkForSpeculation ignore evaluatedness of variables
  See the Note [exprOkForSpeculation and evaluated variables]
  This means that the binder-swap transformation no longer
  invaliates the let/app invariant.

* Make exprOkForSpeculation return False for
     DataToTagOp and SeqOp.
  See Note [exprOkForSpeculation and SeqOp/DataToTagOp]

* Remove the 'can_fail' property from dataToTag#; it was
  always a hack (described in the old Note [dataToTag#] in
  primops.txt.pp), and now its not necessary because of the
  fixes above.

* Make SetLevels use exprIsHNF, /not/ exprOkForSpeculation,
  when floating single-alternative cases.  See SetLevels
  Note [Floating single-alternative cases]

* Fix a buglet in FloatIn; probably never bites in practice
  See Note [Dead bindings]

Collectively, these changes finally fix Trac #15696.

comment:90 Changed 6 days ago by simonpj

Resolution: fixed
Status: patchclosed
Test Case: codeGen/should_run/T15696_1, T15696_2, T15696_3

Ben, Omer: I believe I have finally finished this ticket, as promised.

Do check my work! But, optimistically, I'm going to close it.

comment:91 Changed 29 hours ago by Ben Gamari <ben@…>

In d549c08/ghc:

dmdAnal: Move handling of datacon strictness to mkWWstr_one

Previously datacon strictness was accounted for when we demand analysed a case
analysis. However, this results in pessimistic demands in some cases. For
instance, consider the program (from T10482)

    data family Bar a
    data instance Bar (a, b) = BarPair !(Bar a) !(Bar b)
    newtype instance Bar Int = Bar Int

    foo :: Bar ((Int, Int), Int) -> Int -> Int
    foo f k =
      case f of
        BarPair x y -> case burble of
                          True -> case x of
                                    BarPair p q -> ...
                          False -> ...

We really should be able to assume that `p` is already evaluated since it came
from a strict field of BarPair.

However, as written the demand analyser can not conclude this since we may end
up in the False branch of the case on `burble` (which places no demand on `x`).
By accounting for the data con strictness later, applied to the demand of the
RHS, we get the strict demand signature we want.

See Note [Add demands for strict constructors] for a more comprehensive
discussion.

Test Plan: Validate

Reviewers: simonpj, osa1, goldfire

Subscribers: rwbarton, carter

GHC Trac Issues: #15696

Differential Revision: https://phabricator.haskell.org/D5226

comment:92 Changed 10 hours ago by Simon Peyton Jones <simonpj@…>

In d77501cd/ghc:

Improvements to demand analysis

This patch collects a few improvements triggered by Trac #15696,
and fixing Trac #16029

* Stop making toCleanDmd behave specially for unlifted types.
  This special case was the cause of stupid behaviour in Trac
  #16029.  And to my joy I discovered the let/app invariant
  rendered it unnecessary.  (Maybe the special case pre-dated
  the let/app invariant.)

  Result: less special-case handling in the compiler, and
  better perf for the compiled code.

* In WwLib.mkWWstr_one, treat seqDmd like U(AAA).  It was not
  being so treated before, which again led to stupid code.

* Update and improve Notes

There are .stderr test wibbles because we get slightly different
strictness signatures for an argumment of unlifted type:
    <L,U> rather than <S,U>        for Int#
    <S,U> rather than <S(S),U(U)>  for Int
Note: See TracTickets for help on using tickets.