Derived Ord instance for enumerations with more than 8 elements seems to be incorrect

changed milestone to %8.6.1

changed weight to 5

changed weight to 10

Trac metadata

Trac field	Value
TypeOfFailure	OtherFailure → IncorrectResultAtRuntime
Priority	normal → highest

added incorrect runtime result label

That's terrible!

The following all produce the same (wrong) restul

 S.fromList (map f [ () | _ <- [1..10] ])    -- T2,T2
 S.fromList [f (), f ()]                     -- T2,T2
 S.fromList [f (), f (), f()]                -- T2,T2

However this works fine

 f () `S.insert` (f () `S.insert` S.empty)

The comparison operations generated by deriving look ok to me.

The problem is in this Core generated for this program:

-- RHS size: {terms: 5, types: 2, coercions: 0, joins: 0/0}
f_r6li
f_r6li = \ ds_d3M6 -> case ds_d3M6 of { () -> T2 }

-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
lvl_r6lj
lvl_r6lj = f_r6li ()

main2
main2
  = case dataToTag# lvl_r6lj of a#_a2rY { __DEFAULT ->
    case dataToTag# lvl_r6lj of b#_a2rZ { __DEFAULT ->
    ...

We get the tag of a CAF (lvl_r6lj) before evaluating it, so we get tag of a thunk. The need for evaluating argument of dataToTag# is explained in Note [dataToTag#] in primops.txt.pp. It seems like we're inlining getTag, and then somehow eliminating the case expression in getTag (which is supposed to evaluate the argument). If I change the INLINE annotation of getTag to NOINLINE this works as expected.

I don't know why we're elminating the case in getTag after inlining it yet.

Ah, so it turns out we have a special case in CorePrep (which runs after simplifications) about dataToTag#, and we generate a case expression around its argument after all the simplifications. It's explained in Note [dataToTag magic] in CorePrep, and I can see in STG that it works as expected:

lvl_r6ru = \u [] f_r6rt ();

lvl1_r6rv = CCS_DONT_CARE :! [lvl_r6ru []];

main2 =
    \u []
        case
            case lvl_r6ru of sat_s6xD { __DEFAULT -> dataToTag# [sat_s6xD]; }
        of
        a#_s6xE
        { __DEFAULT ->
              case
                  case lvl_r6ru of sat_s6xF { __DEFAULT -> dataToTag# [sat_s6xF]; }
              of
              ...

So perhaps this is not because we get tag of a thunk. I don't know why not inlining getTag fixes this.

Here's an example which doesn't depend on any code from containers. It also makes the derived Ord code explicit:

{-# LANGUAGE MagicHash #-}
module Main where

import qualified Data.Foldable as Foldable
import GHC.Exts (dataToTag#, tagToEnum#, (==#), (<#))

main :: IO ()
main | not_ordered a b = print $ Foldable.foldl' (flip wumbo) (singleton a) b
     | otherwise       = pure ()
  where
    {-# NOINLINE f #-}
    f () = T2
    {-# NOINLINE a #-}
    a = f ()
    {-# NOINLINE b #-}
    b = [f ()]

data T = T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 | T9
  deriving (Eq, Show)

instance Ord Main.T where
  compare a b
    = case dataToTag# a of
        a' -> case dataToTag# b of
                b' -> if tagToEnum# (a' <# b') :: Bool then
                          LT
                      else
                          if tagToEnum# (a' ==# b') :: Bool then
                              EQ
                          else
                              GT

data Set a = Bin !a !(Set a) !(Set a)
           | Tip
  deriving Show

not_ordered :: Ord a => a -> [a] -> Bool
not_ordered _ [] = False
not_ordered x (y : _) = x >= y

wumbo :: Ord a => a -> Set a -> Set a
wumbo x0 = go x0 x0
  where
    go :: Ord a => a -> a -> Set a -> Set a
    go orig _ Tip = singleton orig
    go orig x t@(Bin y l r) = case compare x y of
        LT -> error "not used here"
        GT -> Bin y l (go orig x r)
        EQ -> t
{-# INLINE wumbo #-}

singleton :: a -> Set a
singleton x = Bin x Tip Tip

That's really helpful Ryan, thank you.

Apologies, I meant to mention in ticket:15696#comment:160886 what wumbo actually is: it's a stripped down version of insert intended to highlight that it appears to behave differently between GHC 8.4.3 and 8.6.1. The semantics of wumbo differs from that of insert, but here is the important bit:

$ /opt/ghc/8.4.3/bin/ghc -O2 -fforce-recomp Bug.hs && ./Bug
[1 of 1] Compiling Main             ( Bug.hs, Bug.o )
Linking Bug ...
Bin T2 Tip Tip

$ /opt/ghc/8.6.1/bin/ghc -O2 -fforce-recomp Bug.hs && ./Bug
[1 of 1] Compiling Main             ( Bug.hs, Bug.o )
Linking Bug ...
Bin T2 Tip (Bin T2 Tip Tip)

GHC 8.4.3's answer is definitely the correct one, since the only way you'd get 8.6.1's answer is by hitting the GT case of wumbo (which shouldn't happen if you're comparing T2 to T2).

I'm minimizing this.

changed milestone to %8.6.2

Didn't we just recently start using some pointer tagging for types with more than 7 constructors? I'm thinking something could be a drop off in that code.

Here's a smaller version.

ghc T15696 && ./T15696 prints EQ correctly, ghc -O T15696 && ./T15696 prints LT.

{-# LANGUAGE MagicHash #-}
module Main where

import GHC.Exts (dataToTag#, tagToEnum#, (==#), (<#))

main :: IO ()
main = print $ compare a T2
  where
    {-# NOINLINE f #-}
    f = T2
    {-# NOINLINE a #-}
    a = f

data T = T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 | T9
  deriving (Eq, Show, Ord)

{-
instance Ord Main.T where
  compare a b
    = case dataToTag# a of
        a' -> case dataToTag# b of
                b' -> if tagToEnum# (a' <# b') :: Bool then
                          LT
                      else
                          if tagToEnum# (a' ==# b') :: Bool then
                              EQ
                          else
                              GT
-}

Avoiding the use of type classes:

main :: IO ()
main = print $ cmpT a T2
  where
    {-# NOINLINE f #-}
    f = T2
    {-# NOINLINE a #-}
    a = f

data T = T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 | T9
--  deriving (Eq, Show, Ord)

cmpT a b
    = case dataToTag# a of
        a' -> case dataToTag# b of
                b' -> if tagToEnum# (a' <# b') :: Bool then
                          LT
                      else
                          if tagToEnum# (a' ==# b') :: Bool then
                              EQ
                          else
                              GT

With -O we get LT for GHC 8.4 and 8.2 and earlier versions. Without -O it returns EQ as it should.

Omer, might you look at this. With -ddump-stg I see

Main.cmpT :: forall a1 a2. a1 -> a2 -> GHC.Types.Ordering
[GblId, Arity=2, Caf=NoCafRefs, Str=<S,U><S,U>, Unf=OtherCon []] =
    [] \r [a2_s3tf b_s3tg]
        case
            case a2_s3tf of sat_s3th [Occ=Once] {
              __DEFAULT -> dataToTag# [sat_s3th];
            }
        of
        a'_s3ti
        { __DEFAULT ->
              case
                  case b_s3tg of sat_s3tj [Occ=Once] {
                    __DEFAULT -> dataToTag# [sat_s3tj];
                  }
              of
              b'_s3tk
              { __DEFAULT ->
                    case <# [a'_s3ti b'_s3tk] of {
                      __DEFAULT ->
                          case ==# [a'_s3ti b'_s3tk] of {
                            __DEFAULT -> GHC.Types.GT [];
                            1# -> GHC.Types.EQ [];
                          };
                      1# -> GHC.Types.LT [];
                    };
              };
        };

which looks right. In another variant (I made dataToTag# lazy) I saw

Main.cmpT :: forall a1 a2. a1 -> a2 -> GHC.Types.Ordering
[GblId,
 Arity=2,
 Caf=NoCafRefs,
 Str=<S,1*U><S,1*U>,
 Unf=OtherCon []] =
    [] \r [a2_s3tf b_s3tg]
        case a2_s3tf of x1_s3th [Occ=Once] {
          __DEFAULT ->
              case dataToTag# [x1_s3th] of a'_s3ti {
                __DEFAULT ->
                    case b_s3tg of x2_s3tj [Occ=Once] {
                      __DEFAULT ->
                          case dataToTag# [x2_s3tj] of b'_s3tk {
                            __DEFAULT ->
                                case <# [a'_s3ti b'_s3tk] of {
                                  __DEFAULT ->
                                      case ==# [a'_s3ti b'_s3tk] of {
                                        __DEFAULT -> GHC.Types.GT [];
                                        1# -> GHC.Types.EQ [];
                                      };
                                  1# -> GHC.Types.LT [];
                                };
                          };
                    };
              };
        };

which also looks fine. But both stubbornly return LT instead of EQ. This must be a code-gen or RTS issue. I have not looked at the Cmm. Might you do so?

Weirdly enough, I get different answers than the ones simonpj reported for the program in ticket:15696#comment:160903. To be explicit, if I'm using this program:

{-# LANGUAGE MagicHash #-}
module Main where

import GHC.Exts (dataToTag#, tagToEnum#, (==#), (<#))

main :: IO ()
main = print $ compare a T2
  where
    {-# NOINLINE f #-}
    f = T2
    {-# NOINLINE a #-}
    a = f

data T = T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 | T9
  deriving (Eq, Show)

instance Ord Main.T where
  compare a b
    = case dataToTag# a of
        a' -> case dataToTag# b of
                b' -> if tagToEnum# (a' <# b') :: Bool then
                          LT
                      else
                          if tagToEnum# (a' ==# b') :: Bool then
                              EQ
                          else
                              GT

Then I consistently get LT regardless of optimization level:

$ /opt/ghc/8.6.1/bin/ghc -O0 -fforce-recomp Bug.hs && ./Bug 
[1 of 1] Compiling Main             ( Bug.hs, Bug.o )
Linking Bug ...
LT

$ /opt/ghc/8.6.1/bin/ghc -O2 -fforce-recomp Bug.hs && ./Bug 
[1 of 1] Compiling Main             ( Bug.hs, Bug.o )
Linking Bug ...
LT

If I replace all uses of dataToTag# with getTag, however:

{-# LANGUAGE MagicHash #-}
module Main where

import GHC.Base (getTag)
import GHC.Exts (tagToEnum#, (==#), (<#))

main :: IO ()
main = print $ compare a T2
  where
    {-# NOINLINE f #-}
    f = T2
    {-# NOINLINE a #-}
    a = f

data T = T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 | T9
  deriving (Eq, Show)

instance Ord Main.T where
  compare a b
    = case getTag a of
        a' -> case getTag b of
                b' -> if tagToEnum# (a' <# b') :: Bool then
                          LT
                      else
                          if tagToEnum# (a' ==# b') :: Bool then
                              EQ
                          else
                              GT

Only then do I get EQ without optimization:

$ /opt/ghc/8.6.1/bin/ghc -O0 -fforce-recomp Bug.hs && ./Bug 
[1 of 1] Compiling Main             ( Bug.hs, Bug.o )
Linking Bug ...
EQ

$ /opt/ghc/8.6.1/bin/ghc -O2 -fforce-recomp Bug.hs && ./Bug 
[1 of 1] Compiling Main             ( Bug.hs, Bug.o )
Linking Bug ...
LT

What's more, I consistently get the same sets of results in each version of GHC dating back to 8.2.2. This makes me believe that the bug that was exposed here has actually been lurking for quite a while (but perhaps a difference in inlining behavior in 8.6 only just recently exposed it).

The fact that using dataToTag# directly produces incorrect results is perhaps not terribly surprising, since it must always be applied to an evaluated argument (see Note [dataToTag#]). The fact that the version using getTag breaks is more worrisome, since getTag actually forces its argument (with a bang pattern).

It's also worth noting that you can trim T down to just two constructors:

{-# LANGUAGE MagicHash #-}
module Main where

import GHC.Base (getTag)
import GHC.Exts (tagToEnum#, (==#), (<#))

main :: IO ()
main = print $ compare a T2
  where
    {-# NOINLINE f #-}
    f = T2
    {-# NOINLINE a #-}
    a = f

data T = T1 | T2
  deriving (Eq, Show)

instance Ord Main.T where
  compare a b
    = case getTag a of
        a' -> case getTag b of
                b' -> if tagToEnum# (a' <# b') :: Bool then
                          LT
                      else
                          if tagToEnum# (a' ==# b') :: Bool then
                              EQ
                          else
                              GT

And the bug will still trigger. (If you define data T = T2, then the bug will go away.)

Replying to [ticket:15696#comment:160907 RyanGlScott]:

The fact that using dataToTag# directly produces incorrect results is perhaps not terribly surprising...

Except that we have special code to make sure that always happens regardless. The STG Simon pasted looks right from that standpoint: it always applies dataToTag# under a case on its argument. So my bet is that Simon is right: something is going wrong in code generation or the RTS.

Side note: it looks like I was wrong about tagging large types. I believe there was some talk of that, but it doesn't look like it's happened as yet.

Replying to [ticket:15696#comment:160898 dfeuer]:

Didn't we just recently start using some pointer tagging for types with more than 7 constructors? I'm thinking something could be a drop off in that code.

Nope. That code is not ripe yet. This summer was very busy for me.

Another thing: With Peter I discovered last Haskell eXchange (Oct. 2017) that enumerated types with derived Enum/Ord instances could use getTag to derive a constant-time compare (and (==) too). We were not aware of this optimization already being in the codebase. We'll surely revisit this.

Simon, please review. There are a few ways to fix this and I probably did the most conservative thing by always introducing a case around dataToTag#. Perhaps we want to change how we record evaluated-ness of CAFs in their Ids instead.

We may also want to pay more attention to #14677 (closed) if we rely on strictness annotations of data constructor fields to decide on evaluatedness of values elsewhere.

Trac metadata

Trac field	Value
Differential revisions	- → D5196

One other thing to note is, as other commenters above already mentioned, this is not a new bug! The bug was there since years. I think something else (maybe changes in simplifier) revealed it.

Thinking about this more. I think the only case where we can actually avoid entering the argument of dataToTag# is when the argument is a static closure. For anything else (a dynamic closure, a CAF) we need to enter it. I don't know how often people do dataToTag# C (for some constructor C), perhaps it's fine to always enter it. If it is then we can remove the special case for dataToTag# in CorePrep and in the codegen (in Cmm) we can just enter the argument.

If we decide to do that then perhaps we should also revisit the getTag and remove the bang pattern on its argument.

Replying to [ticket:15696#comment:160923 osa1]:

Thinking about this more. I think the only case where we can actually avoid entering the argument of dataToTag# is when the argument is a static closure. For anything else (a dynamic closure, a CAF) we need to enter it.

I'm a tad confused. If the tag bits are non-zero, can't we just use them and avoid dereferencing the pointer? Or is that already folded into "enter it"? Or am I just wrong?

Ah... I am not entirely wrong, but I am partly wrong, because types with many constructors get tagged 1 when evaluated. I think we should at least be able to add a rule that recognizes when dataToTag# is used with a type with only a few constructors: see the logic in the cgAlts gc_plan bndr (AlgAlt tycon) alts case in compiler/codeGen/StgCmmExpr.hs. That way we only have to call getConstrTag when

The type is unknown,
The type has too many constructors, or
The tag bits are 0.

I imagine you can do this quite easily by copying a bit of the splitTyConApp_maybe logic from tagToEnumRule to dataToTagRule in PrelRules.hs and adding a DataToTagUsingTagOp to supplement DataToTagOp.

Aha! Actually, I believe even one fall-back should be able to use tag bits, as long as they're greater than 1. So:

When the type is recognized as having a small number of constructors

Check the tag bits. If they are 0, enter the closure. Otherwise, subtract 1 to get the tag.

When the type is unknown

Check the tag bits. If they are less than or equal to 1, enter the closure. Otherwise, subtract 1 to get the tag.

When the type has too many constructors

Enter the closure.

I guess what I'm suggesting would be implemented most easily by replacing the dataToTag# primop a little bit.

unKnownType, smallType, largeType :: Int#
unKnownType = 0#
smallType = 1#
largeType = 2#

dataToTag# x = dataToTagUsing# unknownType x

-- Takes the "strategy" to use
dataToTagUsing# :: Int# -> a -> Int#

This arrangement is designed so we shouldn't have to do any painful restructuring of the caseRules or dataToTagRule.

The only potential problem I see is if any user-written RULES match on dataToTag#, because that won't work anymore. Should we worry about that?

Before deciding on a solution, let's record what the problem is.

The original thinking was this.

dataToTag# is a primop, so it has no business doing anything as complicated as evaluating its argument; we already have case expressions that the code-gen knows how to compile.
So dataToTag# expects an evaluated argument; in fact, stronger than that, it expects a pointer to the data value itself, so that it can extract the tag directly from the info table.
But if it so happens that the value in question already is evaluated, that's a waste. Example

f x = case x of y -> ...(dataToTag# y)...

It seems a bit silly for dataToTag# to re-evaluate y.

Hence, in CorePrep (see Note [dataToTag magic]) we add a case around the argument to dataToTag# unless the argument is already evaluated.
The "already-evaluated" test is exprIsHNF.
But alas, while exprIsHNF guarantees that the thing will evaluate in O(1) time, it does not guarantee that we have a pointer to the data value itself. Omer accurately diagnoses this problem in the Note in his draft D5196:

Note [Always introduce case around dataToTag# arg]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We used to only generate a case expression around dataToTag# argument when it's
not known to be evaluated (checked with `exprIsHNF`), but this is incorrect.
Here's an example:

  data T = T1 | T2

  main = print (I# (dataToTag# a))
    where
      {-# NOINLINE f #-}
      f = T2
      {-# NOINLINE a #-}
      a = f

`f` is a static closure

  f_r1zz :: Main.T
  [GblId, Caf=NoCafRefs, Unf=OtherCon []] =
      CCS_DONT_CARE Main.T2! [];

but `a` is an _updatable_ CAF!

  a_r1zA :: Main.T
  [GblId, Unf=OtherCon []] =
      [] \u [] f_r1zz;

An updatable CAF is not in HNF (entering the closure does the `newCAF()` stuff
and updates the closure with an IND_STATIC, the usual CAF evaluation routines),
but according to `exprIsHNF` `a` is!

I thought of having a more conservative test in CorePrep. But the rot goes further. Consider this, from Note [dataToTag magic]

   data T = MkT !Bool
   f v = case v of
            MkT y -> dataToTag# y

We certainly know that y will be evaluated, because MkT is a strict constructor. But does it guarantee to point directly to the data value? No! The case-to-let transformation in the Simplifier (Simplify.doCaseToLet) uses exprIsHNF and hence will drop the eval on MkT's argument for things like Omer's a binding. And that is right in a way: the argument to MkT a certainly isn't bottom! But nor does it point to the data value.

So that is a problem. But I think there is another. Look in ticket:15696#comment:160903 and ticket:15696#comment:160904. Here we have the extra case expressions correctly inserted, but we still get the wrong answer. And this is affected by reducing the number of constructor from 9 to 8. 'So I think there may be two separate bugs'.

I'd like to understand the second before looking for a fix. Omer: could you investigate why comments 13 and 14 go wrong?

Sure. The problem in ticket:15696#comment:160903 is exactly what I explained in my patch (and you also copied it here in ticket:15696#comment:160937). I get this STG with GHC HEAD, -O0:

f_r1C3 :: Main.T
[GblId, Caf=NoCafRefs, Unf=OtherCon []] =
    CCS_DONT_CARE Main.T2! [];

a_r1C4 :: Main.T
[GblId, Unf=OtherCon []] =
    [] \u [] f_r1C3;

sat_s1OH :: GHC.Types.Ordering
[LclId] =
    [] \u []
        case dataToTag# [a_r1C4] of a'_s1OD {
          __DEFAULT ->
              case <# [a'_s1OD 1#] of sat_s1OE [Occ=Once] {
                __DEFAULT ->
                    case tagToEnum# [sat_s1OE] of {
                      GHC.Types.False ->
                          case a'_s1OD of {
                            __DEFAULT -> GHC.Types.GT [];
                            1# -> GHC.Types.EQ [];
                          };
                      GHC.Types.True -> GHC.Types.LT [];
                    };
              };
        };

Notice that we do dataToTag# [a_r1C4] and a_r1C4 is an updatable CAF. The result is I get LT instead of EQ. (this is one of the regression tests I added)

I get very similar STG (with the same bug) and same results with these configurations: GHC HEAD, GHC 8.2.2, GHC 8.0.2. All tried with -O0, -O1, -O2.

I'm guessing that you added a NOINLINE for cmpT in ticket:15696#comment:160904. When I do that I get the right answer with all optimisation settings (GHC HEAD), and that makes sense becuase the arguments are now known to be evaluated and exprIsHNF correctly returns False for the arguments, so we do case on the args. I don't know how you get incorrect result in the STG shown in ticket:15696#comment:160904. Could it be that you used an older binary of the test program or something like that? If you give more detailed instructions to reproduce I can take a look.

I agree that if the STG in ticket:15696#comment:160904 is producing wrong result then there's at least one more bug. However I get identical STG when I add NOINLINE aroung cmpT (with GHC HEAD, with -O2) and it works as expected. You said you also made dataToTag# lazier, but because I get identical STG as you I haven't tried to change the primop laziness (it shouldn't matter in STG).

I get this STG with GHC HEAD, -O0:

That's what you get for monoidal's code in ticket:15696#comment:160911. Note the missing eval before the dataToTag#.

But it is not what you get for the code in ticket:15696#comment:160903. The evals are there, as ticket:15696#comment:160904 shows. And yet we get the wrong answer.

But why? That is the second bug. Can you see why that happens?

I'm confused

That's what you get for monoidal's code in ticket:15696#comment:160911

I didn't even look at ticket:15696#comment:160911. I tried your code in ticket:15696#comment:160903 and what happens there is exactly what I explained in my patch.

The evals are there, as ticket:15696#comment:160904 shows.

ticket:15696#comment:160904 shows that evals are there if I don't inline cmpT, and that makes sense as I explained in ticket:15696#comment:160941. We can't assume that args are evaluated so we eval them.

And yet we get the wrong answer.

I get the right answer when I get the STG in ticket:15696#comment:160904.

Could you try to reproduce the error in ticket:15696#comment:160904 and give me more detailed instructions (showing the source and invoked GHC commands)?

Oops sorry. I'd modified the code slightly! Try this

{-# LANGUAGE MagicHash #-}
module Main where

import GHC.Exts (dataToTag#, tagToEnum#, (==#), (<#))
import GHC.Base ( getTag )

main :: IO ()
main = print $ cmpT a T2
  where
    {-# NOINLINE f #-}
    f = T2
    {-# NOINLINE a #-}
    a = f

data T = T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 | T9
--  deriving (Eq, Show, Ord)

cmpT a b
    = case getTag a of
        a' -> case getTag b of
                b' -> if tagToEnum# (a' <# b') :: Bool then
                          LT
                      else
                          if tagToEnum# (a' ==# b') :: Bool then
                              EQ
                          else
                              GT

Compile with -O and run. It prints LT when it should print EQ. But the evals are there!

OK. This is the STG I get:

Main.main_f [InlPrag=NOINLINE] :: Main.T
[GblId, Caf=NoCafRefs, Unf=OtherCon []] =
    CCS_DONT_CARE Main.T2! [];

Main.main_a [InlPrag=NOINLINE] :: Main.T
[GblId, Unf=OtherCon []] =
    [] \u [] Main.main_f;

Main.main1 :: GHC.Base.String
[GblId] =
    [] \u []
        case dataToTag# [Main.main_a] of a'_s3tm {
          __DEFAULT ->
              case <# [a'_s3tm 1#] of {
                __DEFAULT ->
                    case a'_s3tm of {
                      __DEFAULT -> GHC.Show.$fShowOrdering1;
                      1# -> GHC.Show.$fShowOrdering3;
                    };
                1# -> GHC.Show.$fShowOrdering5;
              };
        };

Exactly the same problem, no evals around the CAF (main_a).

Are you maybe confused because in the STG dump you also see the STG for cmpT? It's actually inlined in main so the top-level for cmpT is not used.

Are you maybe confused because in the STG dump you also see the STG for cmpT? It's actually inlined in main so the top-level for cmpT is not used.

That is exactly what I was doing. Thanks!

I backed up some more to look at ticket:15696#comment:160886, which does not have a top-level no-inline CAF. I see this:

$sgo_r5Tk :: Main.T -> Main.T -> Main.Set Main.T -> Main.Set Main.T
[GblId, Arity=3, Str=<L,1*U><L,U><S,1*U>, Unf=OtherCon []] =
    sat-only [] \r [orig_s5Wj ds_s5Wk ds1_s5Wl]
        case ds1_s5Wl of wild_s5Wm [Occ=Once] {
          Main.Bin y_s5Wn l_s5Wo [Occ=Once] r_s5Wp [Occ=Once] ->
              case
                  case ds_s5Wk of sat_s5Wq [Occ=Once] {
                    __DEFAULT -> dataToTag# [sat_s5Wq];
                  }
              of
              a'_s5Wr
              { __DEFAULT ->
                    case dataToTag# [y_s5Wn] of b'_s5Ws {
                      __DEFAULT ->
                          case <# [a'_s5Wr b'_s5Ws] of {
                            __DEFAULT ->
...

Here we take apart a Bin, and call dataToTag# on the contents; and because of the exprIsHNF stuff there is no guarantee that the argument to Bin points directly to the data value.

But in looking at this I found something else! In coment:6 there is no top-level CAF with a NOINLINE, so how do things go wrong. Here's how.

We start with

thk = f ()
g x = ...(case thk of v -> Bin v Tip Tip)...

So far so good; the argument to Bin (which is strict) is evaluated.

Now we do float-out. And in doing so we do a reverse binder-swap (see Note [Binder-swap during float-out] in SetLevels) thus

g x = ...(case thk of v -> Bin thk Nil Nil)...

The goal of the reverse binder-swap is to allow more floating -- and it does! We float the Bin to top level:

lvl = Bin thk Tip Tip
g x = ...(case thk of v -> lvl))...

Now you can see that the argument of Bin, namely thk, points to the thunk, not to the value as it did before; and that gives rise to the bug.

Is this wrong? Not really. We are still guaranteed that the argument to Bin in lvl will be evaluated (by that case thk) before lvl is used. But we are no longer guaranteed that the argument to Bin points directly to the evaluated value.

So now I understand.

OK, moving on to solutions. Your patch seems OK. But I wonder if CorePrep is the right place to do it. After all, some other STG manipulation might break it again.

I think the right thing is for the code generator to do the job; that is, in effect implement dataToTag# properly. That is, in StgCmmPrim, in

cgOpApp (StgPrimOp primop) args res_ty = do

add a special case for DataToTagOp, when we are compiling dataToTag# x. Then behave exactly as if we'd seen case x of y -> dataToTag## y, where by dataToTag## y I mean generate the code the looks in the info table. (We have that code here

--  #define dataToTagzh(r,a)  r=(GET_TAG(((StgClosure *)a)->header.info))
--  Note: argument may be tagged!
emitPrimOp dflags [res] DataToTagOp [arg]
   = emitAssign (CmmLocal res) (getConstrTag dflags (cmmUntag dflags arg))

).

And by "behave exactly as if we'd see case ..." I roughly mean call StgCmmExpr.cgCase. But that need some alts which we don't conveniently have. The easiest thing would be to take -- the general case equation for cgCase and split off the bit that does the eval, so that we can call it from dataToTag#. Doing this is not trivial, but it feels like the Right Thing, and will remove the magic from dataToTag#.

Right, fixing this in Cmm is also what I suggested in ticket:15696#comment:160923. I think two key questions are:

Can we optimise this? That is, can we know for certain that the argument of a dataToTag# points directly to a value (without going through blackholes or other indirections like IND_STATIC)?
Is this worth optimising?

Unless the answer to both of those is yes, then I think it makes sense to do this in Cmm (when compiling the primop). Then we can remove a bunch of notes about getTag and hacks in CorePrep, and simplify getTag to getTag = dataToTag#. I think we can also update the primop as can_fail = False.

It seems to me that Core is not the right place to answer (1). Even in STG I don't know if we can answer that question with certainty. I think it's doable in Cmm where we know about "lambda form" of ids (LambdaFormInfo) which tells us about how to enter an object. If "lambda form" of an argument is LFCon then we optimise, otherwise we enter.

What I've been suggesting is that even when we don't statically know if it's been evaluated or tagged, we should surely be using tag bits when we find them. It looks like an oversight that we haven't done so in the past, and I foresee substantial benefits to fixing that. Perhaps we can win back more performance than we lose.

Right, agreed that looking at tag bits would work for small types.

I agree with ticket:15696#comment:160956. Yes, the code generator is the right place to do it.

Yes, we could ooptimise for LFCon, but the simplifier will have done that already. The only case I think we could reliably optimise, that would not be done already, is

case x of y 
   A       -> blah
   DEFAULT -> ...(dataToTag# y)...

In this case we really do know that y points to the value. It would not be hard to let the code gen spot this; but I doubt it would happen much.

Right, agreed that looking at tag bits would work for small types.

This should happen automatically if we use the code for cgCase. It already has a fast-path for the case when the scrutinee is evaluated.

But you point is perhaps that for small types we don't need to index the info table: the tag is in the bits. Yes, that's a good idea.

Ah, I see what you're saying. The 0 tag is handled by the case wrapper (or C-- equivalent). So for small types, we don't need any extra test; we can just use the tag bits directly. I suppose also that my attempt to use tag bits for unknown types is more trouble than it's worth; we should just assume that those have too many constructors.

As for the type-driven rewriting, I think we basically have two options:

Do it in PrelRules. I think this requires something like the dataToTagWith# primop I mentioned earlier.
Do it sometime later (tidy core, core prep, or lowering to STG). This lets us stick with dataToTag# throughout core2core, which is nice, but I don't know enough to know if any of those make sense or where/how to slot in the rule.

As for the type-driven rewriting

This is not possible unless we somehow distinguish "stuff that directly points to a value (without going through indirections)" from other at the type level.

I submitted another diff that fixes this bug in Cmm. It also does a bunch of simplifications in other parts of the compiler (removes notes, special cases, and hacks for dataToTag#). Here are some example code we generate:

Program:

data T = T1 | T2

main = do
    print (I# (dataToTag# f))
    print (I# (dataToTag# a))
  where
    {-# NOINLINE f #-}
    f = T2
    {-# NOINLINE a #-}
    a = f

For the first dataToTag# we generate:

_s1yo::I64 = 1;   // CmmAssign

for the second

    // ======= DATA TO TAG ========
    Hp = Hp - 16;   // CmmAssign
    I64[Sp - 24] = c1yE;   // CmmStore
    R1 = a_r1lF_closure;   // CmmAssign
    Sp = Sp - 24;   // CmmAssign
    if (R1 & 7 != 0) goto c1yE; else goto c1yF;   // CmmCondBranch
c1yF: // global
    call (I64[R1])(R1) returns to c1yE, args: 8, res: 8, upd: 24;   // CmmCall, this is where we evalaute the arg
c1yE: // global
    _c1yD::I64 = R1;   // CmmAssign
    // ======= DATA TO TAG SMALL FAMILY ========
    _s1yr::I64 = _c1yD::I64 & 7 - 1;   // CmmAssign, read the tag bits

If I make this type a "big family", then we generate (for the second dataToTag#)

    // ======= DATA TO TAG ========
    Hp = Hp - 16;   // CmmAssign
    I64[Sp - 24] = c1A0;   // CmmStore
    R1 = a_r1mL_closure;   // CmmAssign
    Sp = Sp - 24;   // CmmAssign
    if (R1 & 7 != 0) goto c1A0; else goto c1A1;   // CmmCondBranch
c1A1: // global
    call (I64[R1])(R1) returns to c1A0, args: 8, res: 8, upd: 24;   // CmmCall, enter the argument
c1A0: // global
    _c1zZ::I64 = R1;   // CmmAssign
    // ======= DATA TO TAG GENERAL CASE ========
    _s1zN::I64 = %MO_UU_Conv_W32_W64(I32[I64[_c1zZ::I64 & (-8)] - 4]);   // CmmAssign, read info table

The patch is not ready (I'll need to update some notes, I also left some TODOs and questions in the code) but it fixes the bug and demonstrates that this is possible (and even easy) to do in Cmm.

Trac metadata

Trac field	Value
Differential revisions	D5196 → D5196, D5201

Derived Ord instance for enumerations with more than 8 elements seems to be incorrect

Child items ...

Activity

When the type is recognized as having a small number of constructors

When the type is unknown

When the type has too many constructors

Trac field	Value
Version	8.6.1
Type	Bug
TypeOfFailure	OtherFailure
Priority	normal
Resolution	Unresolved
Component	Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture

Derived Ord instance for enumerations with more than 8 elements seems to be incorrect

Relates to

Activity

When the type is recognized as having a small number of constructors

When the type is unknown

When the type has too many constructors