Opened 4 months ago

Closed 4 months ago

Last modified 3 months ago

#14779 closed bug (fixed)

Compiling with -g fails -lint-core checks

Reported by: niteria Owned by:
Priority: normal Milestone: 8.4.2
Component: Compiler (Debugging) Version: 8.3
Keywords: Cc: simonmar, simonpj, bgamari, akio
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Compile-time crash or panic Test Case:
Blocked By: Blocking:
Related Tickets: #14122, #14123, #8472, #14406 Differential Rev(s): phab:D4470
Wiki Page:

Description

Compiling the attached file produces:

$ inplace/bin/ghc-stage2 -O -dcore-lint -g -c Data.Fixed.hs
*** Core Lint errors : in result of Simplifier ***
<no location info>: warning:
    [RHS of str_s2UI :: Addr#]
    The type of this binder is unlifted: str_s2UI
    Binder's type: Addr#
*** Offending Program ***
...
str_s2UI :: Addr#
[LclId,
 Unf=Unf{Src=<vanilla>, TopLvl=False, Value=True, ConLike=True,
         WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 30 0}]
str_s2UI = src<Data.Fixed.hs:78:31-39> "MkFixed"#

str_a2j4 :: String
[LclId,
 Unf=Unf{Src=<vanilla>, TopLvl=False, Value=False, ConLike=True,
         WorkFree=False, Expandable=True, Guidance=IF_ARGS [] 20 0}]
str_a2j4 = src<Data.Fixed.hs:78:31-39> unpackCString# str_s2UI
...

This came up when I wanted to compile GHC HEAD with -g. There are a couple of related tickets, but some of them didn't reproduce. This is a small, self-contained example. I'm hoping that it would be possible to solve this without fully solving #14123 which seems to have bigger scope.

My HEAD is d2511e3b61563ed3fc2c9aec2c90a4156373a24c.

Attachments (1)

Data.Fixed.hs (2.3 KB) - added by niteria 4 months ago.

Download all attachments as: .zip

Change History (33)

Changed 4 months ago by niteria

Attachment: Data.Fixed.hs added

comment:1 Changed 4 months ago by simonpj

I think this is something to do with GHC's rules about top-level bindings. There's a special case for

x :: Addr#
x = "flbha"#

which is allowed.

But here we have

str_s2UI = src<Data.Fixed.hs:78:31-39> "MkFixed"#

One of two things should be true:

  • That is bad, and we should not allow it. In which case whichever code is generating it needs to be fixed.
  • That is ok, and Lint should accept it.

I'm not sure which, because I'm not sure of the semantics of src<blah> ticks. (A long-standing issue.)

comment:2 Changed 4 months ago by niteria

That is bad, and we should not allow it. In which case whichever code is generating it needs to be fixed.

https://phabricator.haskell.org/D3925 suggests that this is the way we want to proceed. It refers to this note:

Note [CoreSyn top-level string literals]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As an exception to the usual rule that top-level binders must be lifted,
we allow binding primitive string literals (of type Addr#) of type Addr# at the
top level. This allows us to share string literals earlier in the pipeline and
crucially allows other optimizations in the Core2Core pipeline to fire.
Consider,

  f n = let a::Addr# = "foo"#
        in \x -> blah

In order to be able to inline `f`, we would like to float `a` to the top.
Another option would be to inline `a`, but that would lead to duplicating string
literals, which we want to avoid. See Trac #8472.

The solution is simply to allow top-level unlifted binders. We can't allow
arbitrary unlifted expression at the top-level though, unlifted binders cannot
be thunks, so we just allow string literals.

It is important to note that top-level primitive string literals cannot be
wrapped in Ticks, as is otherwise done with lifted bindings. CoreToStg expects
to see just a plain (Lit (MachStr ...)) expression on the RHS of primitive
string bindings; anything else and things break. CoreLint checks this invariant.

Also see Note [Compilation plan for top-level string literals].

comment:3 Changed 4 months ago by simonpj

  • D3925 seems to be be saying that a literal string should never be wrapped in certain sorts of ticks. Fine if that's so.
  • But D3925 only makes Lint reject more programs (I think). So how will it help here?
  • The invariant above says that top-level literal strings can't be surrounded by ticks of any kind. And indeed that's what is tripping the Lint error we are seeing in this ticket

comment:4 Changed 4 months ago by niteria

But D3925 only makes Lint reject more programs (I think). So how will it help here?

Right. I was only using it to understand what the invariant should be. Fortunately there's also the note, I should have referred to it directly.

comment:5 Changed 4 months ago by niteria

diff --git a/compiler/simplCore/SimplUtils.hs b/compiler/simplCore/SimplUtils.hs
index d86adbb..c06f03c 100644
--- a/compiler/simplCore/SimplUtils.hs
+++ b/compiler/simplCore/SimplUtils.hs
@@ -1115,6 +1115,7 @@ preInlineUnconditionally env top_lvl bndr rhs rhs_env
   | isCoVar bndr                             = Nothing -- Note [Do not inline CoVars unconditionally]
   | isExitJoinId bndr                        = Nothing
   | not (one_occ (idOccInfo bndr))           = Nothing
+  | exprIsLiteralString rhs                  = Nothing
   | not (isStableUnfolding unf)              = Just (extend_subst_with rhs)

   -- Note [Stable unfoldings and preInlineUnconditionally]

Fixes it for me and seems desirable. I need to gather some more context to be able to intelligently argue for/against it.

comment:6 Changed 4 months ago by simonpj

I wonder why that fixes it. Something must have ledt you there. (This is in `preInlineUnconditionally'.

If we have

let x = "fop#" in f x

where x appears once then I really don't want to disable inlining it. Maybe with an isTopLevel guard.

comment:7 Changed 4 months ago by niteria

Agreed, this is too broad. I need isTopLevel as well.

I didn't get here in a very principled way.

I just saw that FloatOut was doing what I expected (not putting ticks), then after a Simplifier pass it was broken. I noticed that the floated out binding was touched by pre-inline-uncond and it prompted me to look at preInlineUnconditionally. I then vaguely recalled that we intend for these top level unboxed strings to stay floated, so it seemed like a good idea not to inline them. I forgot the top level part in my impromptu patch.

Why do we end up with ticks after that? I don't know yet. But arguably, inlining a top level unboxed string literal is the first place where we went wrong.

NB: With the fixed patch that does isTopLevel I'm able to compile GHC HEAD with -g.

comment:8 Changed 4 months ago by niteria

Ok, I know what's going on "mechanically", but I'm missing some big picture stuff.

In https://phabricator.haskell.org/D3051, bgamari added some logic in Simplify.prepareRhs.go to not tick top level string literals.

Namely this fragment:

             ; let tickIt (id, expr)
                       -- we have to take care not to tick top-level literal
                       -- strings. See Note [CoreSyn top-level string literals].
                     | isTopLevel top_lvl && exprIsLiteralString expr
                     = (id, expr)
                     | otherwise
                     = (id, mkTick (mkNoCount t) expr)

In the case that I'm looking at top_level = NotTopLevel and the literal string gets floated to the top.

The call stack is as follows Simplify.simplNonRecE calls Simplify.simplLazyBind calls Simplify.prepareRhs. Now simplNonRecE always calls simplLazyBind with NotTopLevel and simplLazyBind uses that to call prepareRhs.

So either:

  1. top_level doesn't mean that we're floating to top level, or
  2. it means that we're floating to top level, and we're not supposed to float the string to the top.

I suspect it's 1., but I wasn't able to find a comment that explains its purpose.

Full call stack for reference:

  top_lvl <NotTopLevel> expr "MkFixed"# id (str_s2VH{v} [lid] :: ghc-prim:GHC.Prim.Addr#{(w) tc 32}) t src<Data.Fixed.hs:78:31-39>
  Call stack:
      CallStack (from HasCallStack):
        callStackDoc, called at compiler/utils/Outputable.hs:1188:37 in ghc:Outputable
        pprSTrace, called at compiler/simplCore/Simplify.hs:471:24 in ghc:Simplify
        prepareRhs, called at compiler/simplCore/Simplify.hs:289:34 in ghc:Simplify
        simplLazyBind, called at compiler/simplCore/Simplify.hs:1429:29 in ghc:Simplify
        simplNonRecE, called at compiler/simplCore/Simplify.hs:1333:11 in ghc:Simplify
        simplLam, called at compiler/simplCore/Simplify.hs:1430:30 in ghc:Simplify
        simplNonRecE, called at compiler/simplCore/Simplify.hs:1333:11 in ghc:Simplify
        simplLam, called at compiler/simplCore/Simplify.hs:917:5 in ghc:Simplify
        simplExprF1, called at compiler/simplCore/Simplify.hs:879:5 in ghc:Simplify
        simplExprF, called at compiler/simplCore/Simplify.hs:1709:10 in ghc:Simplify
        completeCall, called at compiler/simplCore/Simplify.hs:1686:22 in ghc:Simplify
        simplIdF, called at compiler/simplCore/Simplify.hs:890:39 in ghc:Simplify
        simplExprF1, called at compiler/simplCore/Simplify.hs:879:5 in ghc:Simplify
        simplExprF, called at compiler/simplCore/Simplify.hs:912:18 in ghc:Simplify
        simplExprF1, called at compiler/simplCore/Simplify.hs:879:5 in ghc:Simplify
        simplExprF, called at compiler/simplCore/Simplify.hs:912:18 in ghc:Simplify
        simplExprF1, called at compiler/simplCore/Simplify.hs:879:5 in ghc:Simplify
        simplExprF, called at compiler/simplCore/Simplify.hs:912:18 in ghc:Simplify
        simplExprF1, called at compiler/simplCore/Simplify.hs:879:5 in ghc:Simplify
        simplExprF, called at compiler/simplCore/Simplify.hs:912:18 in ghc:Simplify
        simplExprF1, called at compiler/simplCore/Simplify.hs:879:5 in ghc:Simplify
        simplExprF, called at compiler/simplCore/Simplify.hs:1079:29 in ghc:Simplify
        simplTick, called at compiler/simplCore/Simplify.hs:892:39 in ghc:Simplify
        simplExprF1, called at compiler/simplCore/Simplify.hs:879:5 in ghc:Simplify
        simplExprF, called at compiler/simplCore/Simplify.hs:280:36 in ghc:Simplify
        simplLazyBind, called at compiler/simplCore/Simplify.hs:229:5 in ghc:Simplify
        simplRecOrTopPair, called at compiler/simplCore/Simplify.hs:198:33 in ghc:Simplify
        simplRecBind, called at compiler/simplCore/Simplify.hs:148:35 in ghc:Simplify
Last edited 4 months ago by niteria (previous) (diff)

comment:9 Changed 4 months ago by niteria

Differential Rev(s): phab:D4423

comment:10 Changed 4 months ago by simonpj

I'm afraid I'm still not happy. The offending case is this. Before, we have:

T14779.conMkFixed
  = src<T14779.hs:78:1-49>
    T14779.mkConstr
      (src<T14779.hs:78:23-29> T14779.tyFixed)
      (src<T14779.hs:78:31-39> GHC.CString.unpackCString# "MkFixed"#)
      ((src<T14779.hs:78:41-42> GHC.Types.[]) @ GHC.Base.String)
      (src<T14779.hs:78:44-49> T14779.Prefix)

The simplifier inlines mkConstr, let-binding its arguments like this

conMkFixed = let str_s2UM :: Addr#
                 str_s2UM = src<T14779.hs:78:31-39> "MkFixed"#

                 str_a2jk :: String
                 str_a2jk = src<T14779.hs:78:31-39> unpackCString# str_s2UM
             in
             ...str_a2jk...

and then the simplifier floats those two bindings outward.

Ben/Bartosz's patch messes with prepareRhs in a slightly delicate way, to remove the tick on the string, but I think the bug is earlier. I believe we should never have a tick wrapped around a literal string. That's certainly what Phab:D3925 seems to say.

If that is true (is it?) then that invariant is already broken in the above intermediate form. We should instead have

conMkFixed = let str_s2UM :: Addr#
                 str_s2UM = "MkFixed"#

                 str_a2jk :: String
                 str_a2jk = src<T14779.hs:78:31-39> unpackCString# str_s2UM
             in
             ...str_a2jk...

Let's first establish whether we ever want a tick wrapped around a literal string. If we don't, let's adopt Phab:D3925, and make the smart constructor mkTick guarantee it. (Unfortunately CoreUtils.mkTick is a jolly complicated function and I don't understand it.)

comment:11 Changed 4 months ago by simonmar

I believe we should never have a tick wrapped around a literal string.

I think it's fine for these src ticks to appear anywhere - they're supposed to have no impact on simplifier behaviour, they only result in some annotations being added to the generated code in the backend. So the simplifier should look through src ticks.

In fact it's probably useful to have src ticks wrapped around these literal strings, because then we can tell where the string came from in gdb.

comment:12 Changed 4 months ago by simonpj

I think it's fine for these src ticks to appear anywhere

As the Note in comment:2 says, our current story is

Note [CoreSyn top-level string literals]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

...

It is important to note that top-level primitive string literals cannot be
wrapped in Ticks, as is otherwise done with lifted bindings. CoreToStg expects
to see just a plain (Lit (MachStr ...)) expression on the RHS of primitive
string bindings; anything else and things break. CoreLint checks this invariant.

We could change this invariant, by allowing some (or maybe any) ticks around a top-level string literal. That's fine too! We can change the story, presumably by changing the invariant, and changing what CoreToSTG does somehow.

But what change to CoreToSTG? Presumably by uconditionally discarding any ticks wrapped around a top-level literal string. But if we do that, how will GDB ever see that info?

I'm ok with either way; but I'd like us to be clear.

comment:13 Changed 4 months ago by niteria

Sorry to be slow on this, I wanted to write a comprehensive response, but I kept going in loops.

It boils down to this: I don't know what the invariant should be and don't know how to arrive at the solution. I also don't think there's much to be gained from changing it at this point. Any extra src ticks won't be used right away and not having ticks on all the primitive strings probably also doesn't affect the final output. My personal bias here would be to not throw away information.

We already have a written invariant ("No top level ticked strings") which gets violated and a bug in a previous fix is what's causing the violation. phab:D4423 is meant to make the previous fix correct.

comment:14 Changed 4 months ago by simonpj

Yes, I think Simon M is the person most likely to have a well-informed view about comment:12.

comment:15 Changed 4 months ago by niteria

Current plan: Don't do anything special to top level primitive literals, have the same behavior for nested and for top level. Throw away top level ticks going in CoreToSTG.

comment:16 Changed 4 months ago by akio

Should this ticket be a release blocker? As shown in #14868, this issue can cause an outright miscompilation when -g is used and Core Lint is disabled.

comment:17 Changed 4 months ago by akio

Cc: akio added

comment:18 Changed 4 months ago by bgamari

Milestone: 8.4.2
Version: 8.3

I'm afraid that it's too late for 8.4.1; it will have to wait for 8.4.2.

comment:19 Changed 4 months ago by niteria

I've hit a small snag trying to follow: https://ghc.haskell.org/trac/ghc/ticket/14779#comment:15

I expected top level primitive strings (things of type Addr#) to be either Lit (MachStr _) or Lit (MachStr _)wrapped in any number of allowed Tick t _.

Running the T9583 after reverting f5b275a239d2554c4da0b7621211642bf3b10650, revealed top level bindings of the form:

$tT_sas7 :: Addr#
[LclId]
$tT_sas7
  = src<libraries/base/Data/Data.hs:(620,1)-(623,25)>
    src<libraries/base/GHC/Base.hs:1047:1-18> $tcT_sahj

$tcT_sahj :: Addr#
[LclId]
$tcT_sahj = "T"#

This appears to be a variant of a problem from the original "allow top level primitive strings" patch. Changing CSE similarly to https://phabricator.haskell.org/D2605#inline-23997 may be what I need to do.

comment:20 Changed 4 months ago by niteria

Relying on correctness with CSE seems a bit brittle though. If I -fno-cse I'm back to the original problem.

That leads me to believe that 2 things need to happen:

  1. CoreLint (and probably CoreToSTG) needs to accept Core like above.
  2. CSE should fix it.

Alternatively we could figure out how it happens and never let it happen.

comment:21 Changed 4 months ago by niteria

Fixing tryForCSE to look through ticks here:

tryForCSE :: Bool -> CSEnv -> InExpr -> OutExpr
tryForCSE toplevel env expr
  | toplevel && exprIsLiteralString expr = expr
      -- See Note [Take care with literal strings]
  | Just e <- lookupCSEnv env expr'' = mkTicks ticks e
  | otherwise                        = expr'

Appears to take care of the problem from 2 previous comments.

comment:22 Changed 4 months ago by simonpj

Fixing tryForCSE to look through ticks here:

I don't get it. You've added a case to tryForCSE that looks a the original expression expr, not the stripped one expr''. How does that differ from the existing code where the test is in cse_bind?

comment:23 Changed 4 months ago by niteria

I don't get it. You've added a case to tryForCSE that looks a the original expression expr, not the stripped one expr. How does that differ from the existing code where the test is in cse_bind?

Sorry, perhaps I was imprecise. I pointed to code before the change, the code after the change is:

tryForCSE :: Bool -> CSEnv -> InExpr -> OutExpr
tryForCSE toplevel env expr
  | toplevel && exprIsMbTickedLitString expr = expr
      -- See Note [Take care with literal strings]
  | Just e <- lookupCSEnv env expr'' = mkTicks ticks e
  | otherwise                   

Where exprIsMbTickedLitString looks through allowed ticks. I'm not married to the name by the way, I have a hard time coming up with a better one.

For completeness:

exprIsMbTickedLitString :: CoreExpr -> Bool
exprIsMbTickedLitString = isJust . exprIsMbTickedLitString_maybe

exprIsMbTickedLitString_maybe :: CoreExpr -> Maybe CoreExpr
exprIsMbTickedLitString_maybe e@(Lit (MachStr _)) = Just e
exprIsMbTickedLitString_maybe (Tick t e)
  | tickishPlace t == PlaceCostCentre = Nothing
  | otherwise = exprIsMbTickedLitString_maybe e
exprIsMbTickedLitString_maybe _ = Nothing

I should be able to put up a complete phab patch soon and discussing the details should be easier.

comment:24 Changed 4 months ago by niteria

Differential Rev(s): phab:D4423phab:D4423, phab:D4470

comment:25 Changed 4 months ago by simonpj

D4423 is marked abandoned, in favour of Phab:D4470. Is that right? If so, let's remove it from the Differential Revs field.

comment:26 Changed 4 months ago by simonpj

In Phab:D4470 I see no change in tryForCSE. Are we good without it now?

comment:27 Changed 4 months ago by niteria

Differential Rev(s): phab:D4423, phab:D4470phab:D4470

comment:28 Changed 4 months ago by niteria

In ​Phab:D4470 I see no change in tryForCSE. Are we good without it now?

Looks like I confused everyone, myself included. The code for tryForCSE that I copied comes from https://phabricator.haskell.org/D2605. It has been refactored since (9304df5230a7a29d3e992916d133e462b854e55f) and cse_bind is the equivalent place that needed to be fixed. I'm sorry for the confusion.

comment:29 Changed 4 months ago by Bartosz Nitka <niteria@…>

In 5bc195a2/ghc:

Allow top level ticked string literals

This reverts f5b275a239d2554c4da0b7621211642bf3b10650
and changes the places that looked for `Lit (MachStr _))`
to use `exprIsMbTickedLitString_maybe` to unwrap ticks as
necessary.
Also updated relevant comments.

Test Plan:
I added 3 new tests that previously reproduced.
GHC HEAD now builds with -g

Reviewers: simonpj, simonmar, bgamari, hvr, goldfire

Subscribers: rwbarton, thomie, carter

GHC Trac Issues: #14779

Differential Revision: https://phabricator.haskell.org/D4470

comment:30 Changed 4 months ago by niteria

Status: newmerge

comment:31 Changed 4 months ago by bgamari

Milestone: 8.4.28.4.1
Resolution: fixed
Status: mergeclosed

comment:32 Changed 3 months ago by bgamari

Milestone: 8.4.18.4.2

Unfortunately due to a mistake in the bindist preparation this patch didn't quite make it for 8.4.1. It will be present in 8.4.2 (which will likely happen soon).

Note: See TracTickets for help on using tickets.