Opened 13 months ago

Closed 6 weeks ago

#14391 closed task (fixed)

Make the simplifier independent of the typechecker

Reported by: nomeata Owned by: monoidal
Priority: normal Milestone: 8.8.1
Component: Compiler Version: 8.3
Keywords: newcomer Cc: ulysses4ever
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D4503, Phab:D5135, Phab:D5139
Wiki Page:

Description (last modified by ulysses4ever)

I noticed that the simplifier module depends on all of the type checker, and HsSyn stuff, and renamer stuff, which I found strange.

After a little investigation, it seems that the simplifier depends on CoreMonad, and that pulls some very few type-checker related things:

  1. import TcRnMonad        ( initTcForLookup )
    import {-# SOURCE #-} TcSplice ( lookupThName_maybe )
    

for

thNameToGhcName :: TH.Name -> CoreM (Maybe Name)
thNameToGhcName th_name = do
    hsc_env <- getHscEnv
    liftIO $ initTcForLookup hsc_env (lookupThName_maybe th_name)

which is not even used in GHC, but only in GHC Plugins, so this could probably be moved to a separate module pulled in by GhcPlugins.hs

2.

import TcEnv            ( lookupGlobal )

for

instance MonadThings CoreM where
    lookupThing name = do { hsc_env <- getHscEnv
                          ; liftIO $ lookupGlobal hsc_env name }

This might be a bit harder to disentangle. But if successful, it would probably make building GHC in parallel quite a bit faster. And it just seems strange to me that the Core-to-Core code should depend on the type checker…

Simon says:

Both of these code paths go through initTcForLookup which is massive overkill, as the comment with TcEnv.lookupGlobal says. There's clearly a ToDo here to strip off the redundant stuff and do a minimal lookup.

I am optimistically marking this as newcomer because it is a refactoring task, and a good way of learning a bit about various pieces, with a reasonably clear notion of “success”.


UPDATE (April 2018, Phab:4503): both points described above are addressed to the extent when in order to close the ticket we need to do mere code movement. Namely.

  1. lookupThName_maybe and initTcForLookup are eliminated from CoreMonad.hs completely. Its client, though, thNameToGhcName, is better to be moved in the future also, for it is not used in the CoreMonad.hs (or anywhere else) anyway. Joachim suggested “any module reexported by GhcPlugins (or maybe even that module itself)”.
  1. CoreMonad.hs still calls lookupGlobal which is no longer bound to the typechecker monad, but still resides in TcEnv.hs — it should be moved out of Tc-land at some point in the future in order to close the ticket.

Change History (44)

comment:1 Changed 12 months ago by jkiefer

Hello. Am newcomer. I'd love to take a stab at this! Will start hacking away.

comment:2 Changed 12 months ago by bgamari

Great! Do let us know if you encounter trouble.

comment:3 Changed 12 months ago by simonpj

Great.

I suggest you start with TcEnv.lookupGlobal.

  • It invokes initTcForLookup (massive overkill) in order to call tcLookupGlobal. We need a versionn of this function that operates in the IO monad, not the TcM monad.
  • What does tcLookupGlobal get from the TcM monad? It'll need to get these things as explicit arguments instead, I guess. For example: consults the tcg_type_env, which was initialised by initTcForLookup. And it uses tcg_semantic_mod likewise.
  • Then it hands off to LoadIface.tcLookupImported_maybe. That does a bit more IO-ish things before finally deciding to load a new interface decl in tcImportDecl_maybe.
  • tcImportDecl_maybe uses initIfaceTcRn to make an IfM moand in which to do the loading work. But instesad you can make an IfM from scratch, by writing a variant of initIfaceTcRn.

Nothing really hard here, but you'll need to carefully tease out what dependencies are where. (The lack of explicit dependencies is, of course, both the blessing and the curse of monadic progrmaming.)

Happy to help.

comment:4 Changed 12 months ago by jkiefer

Alrighty! First question while I investigate more into performing a minimal lookup instead of using initTcForLookup.

(Bear with me, I'm writing some of this out as I understand it myself)

We want to refactor thNameToGhcName to no longer be a part of CoreMonad and instead exist in a module to be pulled by GhcPlugins.hs.

Do correct me if I'm wrong, but creating a singleton module with just thNameToGhcName in it seems like a naive approach. Is there an existing module that is pulled in by GhcPlugins.hsthat would be an appropriate home for thNameToGhcName? Or should a new one simply be created?

comment:5 Changed 12 months ago by simonpj

I suggest you start with TcEnv.lookupGlobal.

Doing this will avoid you getting sucked into template haskell stuff right away. We can come back to your question when this is done.

comment:6 in reply to:  5 Changed 12 months ago by jkiefer

Gotcha. Thank you for the guidance!

comment:7 Changed 8 months ago by ulysses4ever

As it seems to stale for a while, I'd like to give it a try.

IIUC, the strategy laid by Simon suggests developing a Tc-less version of lookupGlobal. For this we need to identify ties to Tc inside it and then try to cut those.

First tie (also spelled by Simon above) is “tcg_type_env, which was initialised by initTcForLookup”. I looked at initTcForLookup and its dependencies, and it seems to me that tcg_type_envis initialized with emptyNameEnv there. A question: does that mean that corresponding part of the tcLookupGlobal which queries tcg_type_env is not needed at all and can be omitted in the Tc-less version of lookupGlobal?

comment:8 Changed 8 months ago by simonpj

does that mean that corresponding part of the tcLookupGlobal which queries tcg_type_env is not needed at all and can be omitted

Yes I think so.

comment:9 Changed 8 months ago by ulysses4ever

One more question. Consider a part of tcImportDecl_maybe:

initIfaceTcRn (importDecl name)

You said that it is necessary to create a variant of initIfaceTcRn. That sounds fine. But here is another thing: importDecl has IfM in its type. And IfM is defined in TcRnTypes, so leaving it doesn't bring us decoupling from the typechecker. Should we also replace importDecl? This sounds like some more work, because it depends on loadInterface which is >100 LOCs.

Last edited 8 months ago by ulysses4ever (previous) (diff)

comment:10 Changed 8 months ago by ulysses4ever

Ok, after some more thinking it seems to me that importing just TcRnTypes is not big deal. Now I'm wondering if it is legit to deal with initIfaceTcRn by just repalacing it with TcRnMonad.initIfaceLoad. If so, do we need to move it to some less Tc-heavy place?

comment:11 Changed 8 months ago by simonpj

importDecl has IfM in its type. And IfM is defined in TcRnTypes, so leaving it doesn't bring us decoupling from the typechecker. Should we also replace importDecl?

No, leave all that. The point is as follows (please document this in your patch):

  • TcEnv.lookupGlobal may look up an Id that one one has previously looked up.
  • If so, we are going to read its interface file, and add its bindings to the ExternalPackageTable, which is a persistent in-memory cache of information about other modules.
  • Loading that interface file does quite a bit of work, but we don't consider that as "part of the typechecker"; it's essentially just de-serialising interface-file data on disk. For example, any failures are not user errors; they represent messed-up files or GHC bugs, so can legitimately raise an exception.
  • The entire mechanism of importDecl and loadInterface is part of this. Don't duplicate it!
Last edited 8 months ago by simonpj (previous) (diff)

comment:12 Changed 8 months ago by simonpj

I'm wondering if it is legit to deal with initIfaceTcRn by just repalacing it with TcRnMonad.initIfaceLoad

Yes that sounds just right.

If so, do we need to move it to some less Tc-heavy place?

We might indeed want to do this. But you might want to keep the moving-code-around work in a separate patch, lest the diffs from that refactoring obscure the main payload

comment:13 Changed 8 months ago by ulysses4ever

Owner: set to ulysses4ever

I have one more question about tcLookupGlobal (hope the last one for it). Here its part:

if nameIsLocalOrFrom (tcg_semantic_mod env) name
then notFound name  -- Internal names can happen in GHCi
else

I have trouble with notFound part which essentially does careful error-reporting inside Tc monad. I'm lost on how to port this to IO with the same amount of precision. So far I come up with dumb solution (to be placed in the then branch):

pprPanic "lookupGlobal" (ppr name)

It is probably not good enough. Or is it?

comment:14 Changed 8 months ago by ulysses4ever

So far I've replaced error-reporting for TcM (notFound, failWithTc) with pprPanic.

Now I have full version of lookupGlobal in IO (link). It doesn't seem to break any test (besides already broken ones). Should I submit it to the Phabricator or proceed with the second part pointed out by Joachim, thNameToGhcName?

comment:15 Changed 8 months ago by ulysses4ever

Actually, I don't see thNameToGhcName used anywhere (on master). So it could be (relatively) freely moved. Maybe TcSplice, which it depends upon, is the right place for it? In that case the return type's monad, probably, should be changed from CoreM to plain IO.

comment:16 Changed 8 months ago by nomeata

thNameToGhcName is useful for GHC plugins, so any module reexported by GhcPlugins (or maybe even that module itself) is a good place.

comment:17 Changed 8 months ago by ulysses4ever

Differential Rev(s): D4503

comment:18 Changed 8 months ago by ulysses4ever

Status: newpatch

comment:19 Changed 8 months ago by ulysses4ever

Differential Rev(s): D4503Phab:D4503

comment:20 Changed 8 months ago by ulysses4ever

Let's turn to thNameToGhcName now. We want to remove initTcForLookup from it, and for this we need to replace lookupThName_maybe with Tc-less version.

The lookupThName_maybe, in turn, depends on lookupGlobalOccRn_maybe which seems to be very Rn-heavy inside. So I'm not sure how to proceed: should I try to rewrite lookupGlobalOccRn_maybe or leave it. In the latter case we need to find a way to run RnM in thNameToGhcName, which, I guess, not far from the dreaded initTcForLookup?

comment:21 Changed 8 months ago by simonpj

Well,

  • initTcForLookup initialises the tcg_rdr_env to emptyGlobalRdrEnv, and tcl_rdr to emptyLocalRdrEnv.
  • So in lookupThName_maybe, the lookupLocalRdrEnv is guaranteed to faile; and in lookupGlobalOccRn_maybe only the lookupExactOrOrig stuff can succeed.
  • For lookupOrig we are simply looking in the OrigNameCache, which is available in CoreM.
  • I think lookupExact is even easier, because I think the name must be an External name.

Maybe that can help you get further

comment:22 Changed 8 months ago by ulysses4ever

This makes a lot sense, thanks! I've started doing this.

For the other, simpler part of the refactoring, namely, the module canocalization thing, I've submitted update to the Phab. I'm not sure where to put that refactored function though.

comment:23 Changed 8 months ago by ulysses4ever

Done with TH part closely following your directions. See Phab.

comment:24 Changed 7 months ago by ulysses4ever

Description: modified (diff)

comment:25 Changed 7 months ago by ulysses4ever

Description: modified (diff)

comment:26 Changed 6 months ago by Ben Gamari <ben@…>

In bb3fa2d1/ghc:

Less Tc inside simplCore (Phase 1 for #14391)

Simplifier depends on typechecker in two points: `thNameToGhcName`
(`lookupThName_maybe`, in particular)  and `lookupGlobal`. We want to
cut the ties in two steps.

1. (Presented in this commit), reimplement both functions in a way that
doesn't use typechecker.

2. (Should follow), do code moving: a) `lookupGlobal` should go in some
typechecker-free place; b) `thNameToGhcName` should leave simplifier,
because it is not used there at all (probably, it should be placed
somewhere where `GhcPlugins` can see it -- this is suggested by Joachim
on Trac).

Details
=======

We redesigned lookup interface a bit so that it exposes some
`IO`-equivalents of `Tc`-features in use.

First, `CoreMonad.hs` still calls `lookupGlobal` which is no longer
bound to the typechecker monad, but still resides in `TcEnv.hs` — it
should be moved out of Tc-land at some point (“Phase 2”) in the
future in order to achieve its part of the #14391's goal.

Second, `lookupThName_maybe` is eliminated from `CoreMonad.hs`
completely; this already achieves its part of the goal of #14391. Its
client, though, `thNameToGhcName`, is better to be moved in the future
also, for it is not used in the `CoreMonad.hs` (or anywhere else)
anyway. Joachim suggested “any module reexported by GhcPlugins (or
maybe even that module itself)”.

As a side goal, we removed `initTcForLookup` which was instrumental for
the past version of `lookupGlobal`. This, in turn, called for pushing
some more parts of the lookup interface from the `Tc`-monad to `IO`,
most notably, adding `IO`-version of `lookupOrig` and pushing
`dataConInfoPtrToName` to `IO`. The `lookupOrig` part, in turn,
triggered a slight redesign of name cache updating interface: we now
have both, `updNameCacheIO` and `updNameCacheTc`, both accepting `mod`
and `occ` to force them inside, instead of more error-prone outside
before. But all these hardly have to do anything with #14391, mere
refactoring.

Reviewers: simonpj, nomeata, bgamari, hvr

Reviewed By: simonpj, bgamari

Subscribers: rwbarton, thomie, carter

GHC Trac Issues: #14391

Differential Revision: https://phabricator.haskell.org/D4503

comment:27 Changed 2 months ago by monoidal

Owner: ulysses4ever deleted
Status: patchnew

Moving out of 'patch'.

ulysses4ever, do you plan to do phase 2?

comment:28 Changed 2 months ago by ulysses4ever

I'd love to. Simon told that it is better to be controlled by someone with more experience though. If nothing, I can shuffle a bunch of definitions myself to achieve the goal formally (no Tc-related imports inside CoreMonad.hs) and then submit the Diff. Does that sound good?

Last edited 2 months ago by ulysses4ever (previous) (diff)

comment:29 Changed 2 months ago by simonpj

Simon told that it is better to be controlled by someone with more experience though.

I hope I didn't say that. Perhaps "advised by" or "supported by" someone with more experience, but "controlled by" is so dis-empowering! You are doing us a favour -- thank you -- keep going! I and others will try to support you.

Before investing a lot of effort in coding, it's best to lay out a plan (on a wiki page perhaps) with your goals, and the steps you propose to take to achieve those goals.

comment:30 Changed 2 months ago by monoidal

Regarding update (April 2018) from the ticket description:

Point 1: I agree that we can move thNameToGhcName to GhcPlugins; it's not used anywhere. Optionally, we could also move there ioLookupDataCon and ioLookupDataCon_maybe (they are also unused). Either way, it's cheap.

Point 2: I understand the goal is to move lookupGlobal away from typecheck/ directory. Currently lookupGlobal calls lookupGlobal_maybe, which calls lookupImported_maybe, which calls importDecl_maybe, which calls initIfaceLoad, which is in TcRnMonad. So I see that we can weaken the dependency of CoreMonad from TcEnv to TcRnMonad, but I don't see yet how to get rid of it completely. Is my understanding correct?

comment:31 Changed 2 months ago by ulysses4ever

Simon, my bad, the wording isn't accurate and isn't yours, indeed. I think, we all understand what was meant though.

Monoidal, from the top of my head, you are quite right in both points. The second point is a bit dissapointing, and I have no idea (again, ftoh) how to improve it. Actually, if you are eager to do what is clear, please, go ahead: I will be stuck with mortal things at least till the end of the week.

comment:32 Changed 2 months ago by ulysses4ever

Cc: ulysses4ever added

comment:33 Changed 2 months ago by simonpj

Re lookupGlobal, there is a reason for this! If you look up Complex, say, then GHC might need to read in Complex.hi, and deserialise and typecheck it (TcIface does this). The "typechecking" will never fail, unless it's a stale .hi file or something, but it's the process of turning a bare syntax tree into TyCons, Ids etc.

Now that does not need the full glory of the Tc monad; but it does need quite a bit. Teasing out exactly what it uses, and perhaps making a stripped-down Tc monad just for that, might be worthwhile. But it would be work, perhaps more than is justified until we have a stronger cause.

I'm sure there are lots of other ways in which things could be better structured too.

comment:34 Changed 2 months ago by monoidal

I created a graph with all module dependencies in compiler/.

If we don't count SOURCE imports, this graph is acyclic.

If we do, there are cyclic dependencies, and we can analyze strongly connected components. Currently, there are 440 modules, but 357 of them are in the same component. In other words, we have 357 modules that import directly or indirectly each other. The full structure is 1*76 + 7 + 357. This means there's a bunch of one-element components, one 7-element component (FastString, Pretty, Panic, Outputable etc.) and everything else is in a big chunk.

In total, there are 78 components. What if we could magically remove only one edge from this graph and try to obtain the largest amount of components? Here are the winners, the larger the better:

# of components / import from / import to / new structure
282 typecheck/TcRnMonad.hs -> typecheck/TcSplice.hs 1*273 + 2*2 + 4 + 6 + 7*2 + 11 + 15 + 113
259 typecheck/TcSplice.hs -> main/HscMain.hs        1*252 + 2*4 + 4 + 7 + 169
242 simplCore/CoreMonad.hs -> typecheck/TcEnv.hs    1*236 + 2*2 + 4 + 7 + 88 + 101
235 main/DynFlags.hs -> main/Plugins.hs             1*229 + 2*2 + 4 + 7 + 96 + 100
234 main/Plugins.hs -> simplCore/CoreMonad.hs       1*228 + 2*2 + 4 + 7 + 98 + 99
137 main/HscMain.hs -> main/CodeOutput.hs           1*134 + 4 + 7 + 295
128 main/CodeOutput.hs -> nativeGen/AsmCodeGen.hs   1*125 + 4 + 7 + 304
97  main/HscMain.hs -> simplCore/SimplCore.hs       1*95 + 7 + 338
78  [current situation]                             1*76 + 7 + 357

This ticket shows up at the third place (CoreMonad -> TcEnv). If we could remove CoreMonad -> TcEnv import, the structure would improve to 1*236 + 2*2 + 4 + 7 + 88 + 101. Instead of a 357-sized cyclic chunk, we would have half of the modules not participating in any cycle, and the big chunk reduced to two smaller ones.

Of course, my script didn't consider whether any item on this list is realistic.

Looking at this list, there is a specific chain: DynFlags -> Plugins -> CoreMonad -> TcEnv/TcRnMonad -> TcSplice -> HscMain. Since basically everything depends on DynFlags, and HscMain depends on basically everything, this is why everything is in a big chunk. Breaking this chain at any one of the "->" arrows would improve the module split significantly, and to me it's one of the better things we can do to simplify dependencies in GHC.

Last edited 2 months ago by monoidal (previous) (diff)

comment:35 Changed 2 months ago by simonpj

Interesting!

One way to avoid the need for .hs-boot files would be to rename-and-typecheck each SCC as a whole. (Implementing this idea would make an excellent project, BTW.)

If we did that, then yes, reducing SCC sizes would become highly relevant. (Until we implement it, I'm not sure that reducing SCC sizes is that important.)

I had a look at the top candidate: the import of TcSplice in TcRnMonad. I think it'd be very simple to untangle. It's only needed to allow the call to runRemoteModFinalizers. But I think you could easily have

  th_modfinalizers_var :: IORef [(TcLclEnv, ThModFinalizers)]

and then TcRnMonad would not need to mention runRemoteModFinalizers. I think that'd be a straight improvement, and we should do it regardless. Would you like to try that? It's a rather simple change.

comment:36 Changed 2 months ago by monoidal

Owner: set to monoidal

Yes, I'll check.

comment:37 Changed 2 months ago by monoidal

Differential Rev(s): Phab:D4503Phab:D4503, Phab:D5135

Thank you, this worked. I submitted D5135.

Going back to point 2 the original ticket, can we just remove this instance?

 instance MonadThings CoreM where
     lookupThing name = do { hsc_env <- getHscEnv
                           ; liftIO $ lookupGlobal hsc_env name }

Nothing seems to use it, the testsuite passes, and this way we won't need lookupGlobal in CoreMonad.

comment:38 Changed 2 months ago by nomeata

CoreM is used by plugins, so to find out if anyone uses it, you’d have to go through various plugins. (But because it is a type class instance, you cannot reliably tell if an occurence of lookupThing in a plugin is actually using this instance … so I guess you just have to try building them)

comment:39 Changed 2 months ago by monoidal

What if we moved the instance to GhcPlugins? I know it would be an orphan instance, but it would at least accomplish the goal.

comment:40 Changed 2 months ago by nomeata

I think the plugin authors would be fine with that.

comment:41 Changed 2 months ago by monoidal

Differential Rev(s): Phab:D4503, Phab:D5135Phab:D4503, Phab:D5135, Phab:D5139

I've done part 1 and 2 in D5139.

This removes the last direct import from simplCore/ to typechecker/.

There are several indirect imports remaining and I'll investigate what can be done about them after D5139.

Last edited 2 months ago by monoidal (previous) (diff)

comment:42 Changed 2 months ago by Krzysztof Gogolewski <krz.gogolewski@…>

In 03b779f/ghc:

Make CoreMonad independent of TcEnv (#14391)

Summary:
This removes the last direct import from simplCore/
to typechecker/.

Test Plan: validate

Reviewers: nomeata, simonpj, bgamari

Reviewed By: simonpj

Subscribers: rwbarton, carter

GHC Trac Issues: #14391

Differential Revision: https://phabricator.haskell.org/D5139

comment:43 Changed 6 weeks ago by Ben Gamari <ben@…>

In e5013a5/ghc:

Make TcRnMonad independent of TcSplice (#14391)

Test Plan: validate

Reviewers: simonpj, bgamari

Reviewed By: simonpj

Subscribers: rwbarton, carter

GHC Trac Issues: #14391

Differential Revision: https://phabricator.haskell.org/D5135

comment:44 Changed 6 weeks ago by bgamari

Milestone: 8.8.1
Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.