I originally reported this as a bug on containers issue tracker, but we seem to have concluded that this is probably a bug in the GHC optimizer itself.
I think the shortest repro so far is this:
importqualifiedData.SetasSmain=print$let{-# noinline f #-}f()=T2inS.fromList[f(),f()]dataT=T1|T2|T3|T4|T5|T6|T7|T8|T9deriving(Show,Read,Eq,Ord,Bounded,Enum)
which prints
fromList[T2,T2]
The person who derived this from my original repro says:
And as I said earlier, comment out the T9 constructor => prints fromList [T2] as it should.
Another interesting quote:
Can confirm. Tested with ghc-8.6.1, containers-0.6.0.1 and leancheck-0.7.5 (so it does not seem to depend on the testing framework). Error occurs:
with ghc -O1 and -O2 (but not with -O0)
and if data type has at least 9 elements
So, likely a bug in ghc's optimizer.
in some cases, input has duplicates, but not always.
This is a bad one, makes GHC 8.6.1 totally unusable for me.
Trac metadata
Trac field
Value
Version
8.6.1
Type
Bug
TypeOfFailure
OtherFailure
Priority
normal
Resolution
Unresolved
Component
Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
The problem is in this Core generated for this program:
-- RHS size: {terms: 5, types: 2, coercions: 0, joins: 0/0}f_r6lif_r6li = \ ds_d3M6 -> case ds_d3M6 of { () -> T2 }-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}lvl_r6ljlvl_r6lj = f_r6li ()main2main2 = case dataToTag# lvl_r6lj of a#_a2rY { __DEFAULT -> case dataToTag# lvl_r6lj of b#_a2rZ { __DEFAULT -> ...
We get the tag of a CAF (lvl_r6lj) before evaluating it, so we get tag of a thunk. The need for evaluating argument of dataToTag# is explained in Note [dataToTag#] in primops.txt.pp. It seems like we're inlining getTag, and then somehow eliminating the case expression in getTag (which is supposed to evaluate the argument). If I change the INLINE annotation of getTag to NOINLINE this works as expected.
I don't know why we're elminating the case in getTag after inlining it yet.
Ah, so it turns out we have a special case in CorePrep (which runs after simplifications) about dataToTag#, and we generate a case expression around its argument after all the simplifications. It's explained in Note [dataToTag magic] in CorePrep, and I can see in STG that it works as expected:
lvl_r6ru = \u [] f_r6rt ();lvl1_r6rv = CCS_DONT_CARE :! [lvl_r6ru []];main2 = \u [] case case lvl_r6ru of sat_s6xD { __DEFAULT -> dataToTag# [sat_s6xD]; } of a#_s6xE { __DEFAULT -> case case lvl_r6ru of sat_s6xF { __DEFAULT -> dataToTag# [sat_s6xF]; } of ...
So perhaps this is not because we get tag of a thunk. I don't know why not inlining getTag fixes this.
Here's an example which doesn't depend on any code from containers. It also makes the derived Ord code explicit:
{-# LANGUAGE MagicHash #-}moduleMainwhereimportqualifiedData.FoldableasFoldableimportGHC.Exts(dataToTag#,tagToEnum#,(==#),(<#))main::IO()main|not_orderedab=print$Foldable.foldl'(flipwumbo)(singletona)b|otherwise=pure()where{-# NOINLINE f #-}f()=T2{-# NOINLINE a #-}a=f(){-# NOINLINE b #-}b=[f()]dataT=T1|T2|T3|T4|T5|T6|T7|T8|T9deriving(Eq,Show)instanceOrdMain.Twherecompareab=casedataToTag#aofa'->casedataToTag#bofb'->iftagToEnum#(a'<#b')::BoolthenLTelseiftagToEnum#(a'==#b')::BoolthenEQelseGTdataSeta=Bin!a!(Seta)!(Seta)|TipderivingShownot_ordered::Orda=>a->[a]->Boolnot_ordered_[]=Falsenot_orderedx(y:_)=x>=ywumbo::Orda=>a->Seta->Setawumbox0=gox0x0wherego::Orda=>a->a->Seta->Setagoorig_Tip=singletonoriggoorigxt@(Binylr)=casecomparexyofLT->error"not used here"GT->Binyl(goorigxr)EQ->t{-# INLINE wumbo #-}singleton::a->Setasingletonx=BinxTipTip
Apologies, I meant to mention in ticket:15696#comment:160886 what wumbo actually is: it's a stripped down version of insert intended to highlight that it appears to behave differently between GHC 8.4.3 and 8.6.1. The semantics of wumbo differs from that of insert, but here is the important bit:
$ /opt/ghc/8.4.3/bin/ghc -O2 -fforce-recomp Bug.hs && ./Bug[1 of 1] Compiling Main ( Bug.hs, Bug.o )Linking Bug ...Bin T2 Tip Tip$ /opt/ghc/8.6.1/bin/ghc -O2 -fforce-recomp Bug.hs && ./Bug[1 of 1] Compiling Main ( Bug.hs, Bug.o )Linking Bug ...Bin T2 Tip (Bin T2 Tip Tip)
GHC 8.4.3's answer is definitely the correct one, since the only way you'd get 8.6.1's answer is by hitting the GT case of wumbo (which shouldn't happen if you're comparing T2 to T2).
Didn't we just recently start using some pointer tagging for types with more than 7 constructors? I'm thinking something could be a drop off in that code.
ghc T15696 && ./T15696 prints EQ correctly, ghc -O T15696 && ./T15696 prints LT.
{-# LANGUAGE MagicHash #-}moduleMainwhereimportGHC.Exts(dataToTag#,tagToEnum#,(==#),(<#))main::IO()main=print$compareaT2where{-# NOINLINE f #-}f=T2{-# NOINLINE a #-}a=fdataT=T1|T2|T3|T4|T5|T6|T7|T8|T9deriving(Eq,Show,Ord){-instance Ord Main.T where compare a b = case dataToTag# a of a' -> case dataToTag# b of b' -> if tagToEnum# (a' <# b') :: Bool then LT else if tagToEnum# (a' ==# b') :: Bool then EQ else GT-}
main :: IO ()main = print $ cmpT a T2 where {-# NOINLINE f #-} f = T2 {-# NOINLINE a #-} a = fdata T = T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 | T9-- deriving (Eq, Show, Ord)cmpT a b = case dataToTag# a of a' -> case dataToTag# b of b' -> if tagToEnum# (a' <# b') :: Bool then LT else if tagToEnum# (a' ==# b') :: Bool then EQ else GT
With -O we get LT for GHC 8.4 and 8.2 and earlier versions. Without -O it returns EQ as it should.
Omer, might you look at this. With -ddump-stg I see
Main.cmpT :: forall a1 a2. a1 -> a2 -> GHC.Types.Ordering[GblId, Arity=2, Caf=NoCafRefs, Str=<S,U><S,U>, Unf=OtherCon []] = [] \r [a2_s3tf b_s3tg] case case a2_s3tf of sat_s3th [Occ=Once] { __DEFAULT -> dataToTag# [sat_s3th]; } of a'_s3ti { __DEFAULT -> case case b_s3tg of sat_s3tj [Occ=Once] { __DEFAULT -> dataToTag# [sat_s3tj]; } of b'_s3tk { __DEFAULT -> case <# [a'_s3ti b'_s3tk] of { __DEFAULT -> case ==# [a'_s3ti b'_s3tk] of { __DEFAULT -> GHC.Types.GT []; 1# -> GHC.Types.EQ []; }; 1# -> GHC.Types.LT []; }; }; };
which looks right. In another variant (I made dataToTag# lazy) I saw
Main.cmpT :: forall a1 a2. a1 -> a2 -> GHC.Types.Ordering[GblId, Arity=2, Caf=NoCafRefs, Str=<S,1*U><S,1*U>, Unf=OtherCon []] = [] \r [a2_s3tf b_s3tg] case a2_s3tf of x1_s3th [Occ=Once] { __DEFAULT -> case dataToTag# [x1_s3th] of a'_s3ti { __DEFAULT -> case b_s3tg of x2_s3tj [Occ=Once] { __DEFAULT -> case dataToTag# [x2_s3tj] of b'_s3tk { __DEFAULT -> case <# [a'_s3ti b'_s3tk] of { __DEFAULT -> case ==# [a'_s3ti b'_s3tk] of { __DEFAULT -> GHC.Types.GT []; 1# -> GHC.Types.EQ []; }; 1# -> GHC.Types.LT []; }; }; }; }; };
which also looks fine. But both stubbornly return LT instead of EQ. This must be a code-gen or RTS issue. I have not looked at the Cmm. Might you do so?
Weirdly enough, I get different answers than the ones simonpj reported for the program in ticket:15696#comment:160903. To be explicit, if I'm using this program:
{-# LANGUAGE MagicHash #-}moduleMainwhereimportGHC.Exts(dataToTag#,tagToEnum#,(==#),(<#))main::IO()main=print$compareaT2where{-# NOINLINE f #-}f=T2{-# NOINLINE a #-}a=fdataT=T1|T2|T3|T4|T5|T6|T7|T8|T9deriving(Eq,Show)instanceOrdMain.Twherecompareab=casedataToTag#aofa'->casedataToTag#bofb'->iftagToEnum#(a'<#b')::BoolthenLTelseiftagToEnum#(a'==#b')::BoolthenEQelseGT
Then I consistently get LT regardless of optimization level:
$ /opt/ghc/8.6.1/bin/ghc -O0 -fforce-recomp Bug.hs && ./Bug [1 of 1] Compiling Main ( Bug.hs, Bug.o )Linking Bug ...LT$ /opt/ghc/8.6.1/bin/ghc -O2 -fforce-recomp Bug.hs && ./Bug [1 of 1] Compiling Main ( Bug.hs, Bug.o )Linking Bug ...LT
If I replace all uses of dataToTag# with getTag, however:
{-# LANGUAGE MagicHash #-}moduleMainwhereimportGHC.Base(getTag)importGHC.Exts(tagToEnum#,(==#),(<#))main::IO()main=print$compareaT2where{-# NOINLINE f #-}f=T2{-# NOINLINE a #-}a=fdataT=T1|T2|T3|T4|T5|T6|T7|T8|T9deriving(Eq,Show)instanceOrdMain.Twherecompareab=casegetTagaofa'->casegetTagbofb'->iftagToEnum#(a'<#b')::BoolthenLTelseiftagToEnum#(a'==#b')::BoolthenEQelseGT
Only then do I get EQ without optimization:
$ /opt/ghc/8.6.1/bin/ghc -O0 -fforce-recomp Bug.hs && ./Bug [1 of 1] Compiling Main ( Bug.hs, Bug.o )Linking Bug ...EQ$ /opt/ghc/8.6.1/bin/ghc -O2 -fforce-recomp Bug.hs && ./Bug [1 of 1] Compiling Main ( Bug.hs, Bug.o )Linking Bug ...LT
What's more, I consistently get the same sets of results in each version of GHC dating back to 8.2.2. This makes me believe that the bug that was exposed here has actually been lurking for quite a while (but perhaps a difference in inlining behavior in 8.6 only just recently exposed it).
The fact that using dataToTag# directly produces incorrect results is perhaps not terribly surprising, since it must always be applied to an evaluated argument (see Note [dataToTag#]). The fact that the version using getTag breaks is more worrisome, since getTag actually forces its argument (with a bang pattern).
It's also worth noting that you can trim T down to just two constructors:
{-# LANGUAGE MagicHash #-}moduleMainwhereimportGHC.Base(getTag)importGHC.Exts(tagToEnum#,(==#),(<#))main::IO()main=print$compareaT2where{-# NOINLINE f #-}f=T2{-# NOINLINE a #-}a=fdataT=T1|T2deriving(Eq,Show)instanceOrdMain.Twherecompareab=casegetTagaofa'->casegetTagbofb'->iftagToEnum#(a'<#b')::BoolthenLTelseiftagToEnum#(a'==#b')::BoolthenEQelseGT
And the bug will still trigger. (If you define data T = T2, then the bug will go away.)
The fact that using dataToTag# directly produces incorrect results is perhaps not terribly surprising...
Except that we have special code to make sure that always happens regardless. The STG Simon pasted looks right from that standpoint: it always applies dataToTag# under a case on its argument. So my bet is that Simon is right: something is going wrong in code generation or the RTS.
Side note: it looks like I was wrong about tagging large types. I believe there was some talk of that, but it doesn't look like it's happened as yet.
Didn't we just recently start using some pointer tagging for types with more than 7 constructors? I'm thinking something could be a drop off in that code.
Nope. That code is not ripe yet. This summer was very busy for me.
Another thing: With Peter I discovered last Haskell eXchange (Oct. 2017) that enumerated types with derived Enum/Ord instances could use getTag to derive a constant-time compare (and (==) too). We were not aware of this optimization already being in the codebase. We'll surely revisit this.
Simon, please review. There are a few ways to fix this and I probably did the most conservative thing by always introducing a case around dataToTag#. Perhaps we want to change how we record evaluated-ness of CAFs in their Ids instead.
We may also want to pay more attention to #14677 (closed) if we rely on strictness annotations of data constructor fields to decide on evaluatedness of values elsewhere.
One other thing to note is, as other commenters above already mentioned, this is not a new bug! The bug was there since years. I think something else (maybe changes in simplifier) revealed it.
Thinking about this more. I think the only case where we can actually avoid entering the argument of dataToTag# is when the argument is a static closure. For anything else (a dynamic closure, a CAF) we need to enter it. I don't know how often people do dataToTag# C (for some constructor C), perhaps it's fine to always enter it. If it is then we can remove the special case for dataToTag# in CorePrep and in the codegen (in Cmm) we can just enter the argument.
If we decide to do that then perhaps we should also revisit the getTag and remove the bang pattern on its argument.
Thinking about this more. I think the only case where we can actually avoid entering the argument of dataToTag# is when the argument is a static closure. For anything else (a dynamic closure, a CAF) we need to enter it.
I'm a tad confused. If the tag bits are non-zero, can't we just use them and avoid dereferencing the pointer? Or is that already folded into "enter it"? Or am I just wrong?
Ah... I am not entirely wrong, but I am partly wrong, because types with many constructors get tagged 1 when evaluated. I think we should at least be able to add a rule that recognizes when dataToTag# is used with a type with only a few constructors: see the logic in the cgAlts gc_plan bndr (AlgAlt tycon) alts case in compiler/codeGen/StgCmmExpr.hs. That way we only have to call getConstrTag when
The type is unknown,
The type has too many constructors, or
The tag bits are 0.
I imagine you can do this quite easily by copying a bit of the splitTyConApp_maybe logic from tagToEnumRule to dataToTagRule in PrelRules.hs and adding a DataToTagUsingTagOp to supplement DataToTagOp.
I guess what I'm suggesting would be implemented most easily by replacing the dataToTag# primop a little bit.
unKnownType,smallType,largeType::Int#unKnownType=0#smallType=1#largeType=2#dataToTag#x=dataToTagUsing#unknownTypex-- Takes the "strategy" to usedataToTagUsing#::Int#->a->Int#
This arrangement is designed so we shouldn't have to do any painful restructuring of the caseRules or dataToTagRule.
The only potential problem I see is if any user-written RULES match on dataToTag#, because that won't work anymore. Should we worry about that?
Before deciding on a solution, let's record what the problem is.
The original thinking was this.
dataToTag# is a primop, so it has no business doing anything as complicated as evaluating its argument; we already have case expressions that the code-gen knows how to compile.
So dataToTag# expects an evaluated argument; in fact, stronger than that, it expects a pointer to the data value itself, so that it can extract the tag directly from the info table.
But if it so happens that the value in question already is evaluated, that's a waste. Example
f x = case x of y -> ...(dataToTag# y)...
It seems a bit silly for dataToTag# to re-evaluate y.
Hence, in CorePrep (see Note [dataToTag magic]) we add a case around the argument to dataToTag#unless the argument is already evaluated.
The "already-evaluated" test is exprIsHNF.
But alas, while exprIsHNF guarantees that the thing will evaluate in O(1) time, it does not guarantee that we have a pointer to the data value itself. Omer accurately diagnoses this problem in the Note in his draft D5196:
Note [Always introduce case around dataToTag# arg]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~We used to only generate a case expression around dataToTag# argument when it'snot known to be evaluated (checked with `exprIsHNF`), but this is incorrect.Here's an example: data T = T1 | T2 main = print (I# (dataToTag# a)) where {-# NOINLINE f #-} f = T2 {-# NOINLINE a #-} a = f`f` is a static closure f_r1zz :: Main.T [GblId, Caf=NoCafRefs, Unf=OtherCon []] = CCS_DONT_CARE Main.T2! [];but `a` is an _updatable_ CAF! a_r1zA :: Main.T [GblId, Unf=OtherCon []] = [] \u [] f_r1zz;An updatable CAF is not in HNF (entering the closure does the `newCAF()` stuffand updates the closure with an IND_STATIC, the usual CAF evaluation routines),but according to `exprIsHNF` `a` is!
I thought of having a more conservative test in CorePrep. But the rot goes further. Consider this, from Note [dataToTag magic]
data T = MkT !Bool f v = case v of MkT y -> dataToTag# y
We certainly know that y will be evaluated, because MkT is a strict constructor. But does it guarantee to point directly to the data value? No! The case-to-let transformation in the Simplifier (Simplify.doCaseToLet) uses exprIsHNF and hence will drop the eval on MkT's argument for things like Omer's a binding. And that is right in a way: the argument to MkT a certainly isn't bottom! But nor does it point to the data value.
So that is a problem. But I think there is another. Look in ticket:15696#comment:160903 and ticket:15696#comment:160904. Here we have the extra case expressions correctly inserted, but we still get the wrong answer. And this is affected by reducing the number of constructor from 9 to 8. 'So I think there may be two separate bugs'.
I'd like to understand the second before looking for a fix. Omer: could you investigate why comments 13 and 14 go wrong?
f_r1C3 :: Main.T[GblId, Caf=NoCafRefs, Unf=OtherCon []] = CCS_DONT_CARE Main.T2! [];a_r1C4 :: Main.T[GblId, Unf=OtherCon []] = [] \u [] f_r1C3;sat_s1OH :: GHC.Types.Ordering[LclId] = [] \u [] case dataToTag# [a_r1C4] of a'_s1OD { __DEFAULT -> case <# [a'_s1OD 1#] of sat_s1OE [Occ=Once] { __DEFAULT -> case tagToEnum# [sat_s1OE] of { GHC.Types.False -> case a'_s1OD of { __DEFAULT -> GHC.Types.GT []; 1# -> GHC.Types.EQ []; }; GHC.Types.True -> GHC.Types.LT []; }; }; };
Notice that we do dataToTag# [a_r1C4] and a_r1C4 is an updatable CAF. The result is I get LT instead of EQ. (this is one of the regression tests I added)
I get very similar STG (with the same bug) and same results with these configurations: GHC HEAD, GHC 8.2.2, GHC 8.0.2. All tried with -O0, -O1, -O2.
I'm guessing that you added a NOINLINE for cmpT in ticket:15696#comment:160904. When I do that I get the right answer with all optimisation settings (GHC HEAD), and that makes sense becuase the arguments are now known to be evaluated and exprIsHNF correctly returns False for the arguments, so we do case on the args. I don't know how you get incorrect result in the STG shown in ticket:15696#comment:160904. Could it be that you used an older binary of the test program or something like that? If you give more detailed instructions to reproduce I can take a look.
I agree that if the STG in ticket:15696#comment:160904 is producing wrong result then there's at least one more bug. However I get identical STG when I add NOINLINE aroung cmpT (with GHC HEAD, with -O2) and it works as expected. You said you also made dataToTag# lazier, but because I get identical STG as you I haven't tried to change the primop laziness (it shouldn't matter in STG).
Could you try to reproduce the error in ticket:15696#comment:160904 and give me more detailed instructions (showing the source and invoked GHC commands)?
Oops sorry. I'd modified the code slightly! Try this
{-# LANGUAGE MagicHash #-}module Main whereimport GHC.Exts (dataToTag#, tagToEnum#, (==#), (<#))import GHC.Base ( getTag )main :: IO ()main = print $ cmpT a T2 where {-# NOINLINE f #-} f = T2 {-# NOINLINE a #-} a = fdata T = T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 | T9-- deriving (Eq, Show, Ord)cmpT a b = case getTag a of a' -> case getTag b of b' -> if tagToEnum# (a' <# b') :: Bool then LT else if tagToEnum# (a' ==# b') :: Bool then EQ else GT
Compile with -O and run. It prints LT when it should print EQ. But the evals are there!
Are you maybe confused because in the STG dump you also see the STG for cmpT? It's actually inlined in main so the top-level for cmpT is not used.
That is exactly what I was doing. Thanks!
I backed up some more to look at ticket:15696#comment:160886, which does not have a top-level no-inline CAF. I see this:
$sgo_r5Tk :: Main.T -> Main.T -> Main.Set Main.T -> Main.Set Main.T[GblId, Arity=3, Str=<L,1*U><L,U><S,1*U>, Unf=OtherCon []] = sat-only [] \r [orig_s5Wj ds_s5Wk ds1_s5Wl] case ds1_s5Wl of wild_s5Wm [Occ=Once] { Main.Bin y_s5Wn l_s5Wo [Occ=Once] r_s5Wp [Occ=Once] -> case case ds_s5Wk of sat_s5Wq [Occ=Once] { __DEFAULT -> dataToTag# [sat_s5Wq]; } of a'_s5Wr { __DEFAULT -> case dataToTag# [y_s5Wn] of b'_s5Ws { __DEFAULT -> case <# [a'_s5Wr b'_s5Ws] of { __DEFAULT ->...
Here we take apart a Bin, and call dataToTag# on the contents; and because of the exprIsHNF stuff there is
no guarantee that the argument to Bin points directly to the data value.
But in looking at this I found something else! In coment:6 there is no top-level CAF with a NOINLINE, so
how do things go wrong. Here's how.
We start with
thk = f ()g x = ...(case thk of v -> Bin v Tip Tip)...
So far so good; the argument to Bin (which is strict) is evaluated.
Now we do float-out. And in doing so we do a reverse binder-swap (see Note [Binder-swap during float-out] in SetLevels) thus
g x = ...(case thk of v -> Bin thk Nil Nil)...
The goal of the reverse binder-swap is to allow more floating -- and it does! We float the Bin to top level:
lvl = Bin thk Tip Tipg x = ...(case thk of v -> lvl))...
Now you can see that the argument of Bin, namely thk, points to the thunk, not to the value as it did before; and that gives rise to the bug.
Is this wrong? Not really. We are still guaranteed that the argument to Bin in lvl will be evaluated (by that case thk) before lvl is used. But we are no longer guaranteed that the argument to Bin points directly to the evaluated value.
OK, moving on to solutions. Your patch seems OK. But I wonder if CorePrep is the right place to do it. After all, some other STG manipulation might break it again.
I think the right thing is for the code generator to do the job; that is, in effect implement dataToTag# properly. That is, in StgCmmPrim, in
cgOpApp (StgPrimOp primop) args res_ty = do
add a special case for DataToTagOp, when we are compiling dataToTag# x. Then behave exactly as if we'd seen case x of y -> dataToTag## y, where by dataToTag## y I mean generate the code the looks in the info table. (We have that code here
And by "behave exactly as if we'd see case ..." I roughly mean call StgCmmExpr.cgCase. But that need some alts which we don't conveniently have. The easiest thing would be to take -- the general case equation for cgCase and split off the bit that does the eval, so that we can call it from dataToTag#. Doing this is not trivial, but it feels like the Right Thing, and will remove the magic from dataToTag#.
Right, fixing this in Cmm is also what I suggested in ticket:15696#comment:160923. I think two key questions are:
Can we optimise this? That is, can we know for certain that the argument of a dataToTag# points directly to a value (without going through blackholes or other indirections like IND_STATIC)?
Is this worth optimising?
Unless the answer to both of those is yes, then I think it makes sense to do this in Cmm (when compiling the primop). Then we can remove a bunch of notes about getTag and hacks in CorePrep, and simplify getTag to getTag = dataToTag#. I think we can also update the primop as can_fail = False.
It seems to me that Core is not the right place to answer (1). Even in STG I don't know if we can answer that question with certainty. I think it's doable in Cmm where we know about "lambda form" of ids (LambdaFormInfo) which tells us about how to enter an object. If "lambda form" of an argument is LFCon then we optimise, otherwise we enter.
What I've been suggesting is that even when we don't statically know if it's been evaluated or tagged, we should surely be using tag bits when we find them. It looks like an oversight that we haven't done so in the past, and I foresee substantial benefits to fixing that. Perhaps we can win back more performance than we lose.
Yes, we could ooptimise for LFCon, but the simplifier will have done that already. The only case I think we could reliably optimise, that would not be done already, is
case x of y A -> blah DEFAULT -> ...(dataToTag# y)...
In this case we really do know that y points to the value. It would not be hard to let the code gen spot this; but I doubt it would happen much.
Right, agreed that looking at tag bits would work for small types.
This should happen automatically if we use the code for cgCase. It already has a fast-path for the case when the scrutinee is evaluated.
But you point is perhaps that for small types we don't need to index the info table: the tag is in the bits. Yes, that's a good idea.
Ah, I see what you're saying. The 0 tag is handled by the case wrapper (or C-- equivalent). So for small types, we don't need any extra test; we can just use the tag bits directly. I suppose also that my attempt to use tag bits for unknown types is more trouble than it's worth; we should just assume that those have too many constructors.
As for the type-driven rewriting, I think we basically have two options:
Do it in PrelRules. I think this requires something like the dataToTagWith# primop I mentioned earlier.
Do it sometime later (tidy core, core prep, or lowering to STG). This lets us stick with dataToTag# throughout core2core, which is nice, but I don't know enough to know if any of those make sense or where/how to slot in the rule.
This is not possible unless we somehow distinguish "stuff that directly points to a value (without going through indirections)" from other at the type level.
I submitted another diff that fixes this bug in Cmm. It also does a bunch of simplifications in other parts of the compiler (removes notes, special cases, and hacks for dataToTag#). Here are some example code we generate:
Program:
data T = T1 | T2main = do print (I# (dataToTag# f)) print (I# (dataToTag# a)) where {-# NOINLINE f #-} f = T2 {-# NOINLINE a #-} a = f
For the first dataToTag# we generate:
_s1yo::I64 = 1; // CmmAssign
for the second
// ======= DATA TO TAG ======== Hp = Hp - 16; // CmmAssign I64[Sp - 24] = c1yE; // CmmStore R1 = a_r1lF_closure; // CmmAssign Sp = Sp - 24; // CmmAssign if (R1 & 7 != 0) goto c1yE; else goto c1yF; // CmmCondBranchc1yF: // global call (I64[R1])(R1) returns to c1yE, args: 8, res: 8, upd: 24; // CmmCall, this is where we evalaute the argc1yE: // global _c1yD::I64 = R1; // CmmAssign // ======= DATA TO TAG SMALL FAMILY ======== _s1yr::I64 = _c1yD::I64 & 7 - 1; // CmmAssign, read the tag bits
If I make this type a "big family", then we generate (for the second dataToTag#)
// ======= DATA TO TAG ======== Hp = Hp - 16; // CmmAssign I64[Sp - 24] = c1A0; // CmmStore R1 = a_r1mL_closure; // CmmAssign Sp = Sp - 24; // CmmAssign if (R1 & 7 != 0) goto c1A0; else goto c1A1; // CmmCondBranchc1A1: // global call (I64[R1])(R1) returns to c1A0, args: 8, res: 8, upd: 24; // CmmCall, enter the argumentc1A0: // global _c1zZ::I64 = R1; // CmmAssign // ======= DATA TO TAG GENERAL CASE ======== _s1zN::I64 = %MO_UU_Conv_W32_W64(I32[I64[_c1zZ::I64 & (-8)] - 4]); // CmmAssign, read info table
The patch is not ready (I'll need to update some notes, I also left some TODOs and questions in the code) but it fixes the bug and demonstrates that this is possible (and even easy) to do in Cmm.