Opened 17 months ago

Closed 9 months ago

Last modified 9 months ago

#14052 closed bug (fixed)

Significant GHCi speed regression with :module and `let` in GHC 8.2.1

Reported by: RyanGlScott Owned by: osa1
Priority: high Milestone: 8.4.2
Component: GHCi Version: 8.2.1-rc2
Keywords: Cc: mniip, simonmar
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: #11547 Differential Rev(s): Phab:D4478
Wiki Page:

Description (last modified by RyanGlScott)

I recently noticed that the performance of doctest in GHC 8.2.1 (see the corresponding doctest issue) can be much, much worse than in previous versions of 8.0.2. So bad, in fact, that a project with 865 doctest examples takes about three hours to complete in 8.2.1, whereas it would only take 8 seconds in 8.0.2.

To reproduce this issue in a fairly minimal way, you can use the following script to generate a file which simulates what doctest is doing:

-- GenExample.hs
module Main where

import Control.Monad

import System.Environment
import System.Exit
import System.IO

main :: IO ()
main = do
  args <- getArgs
  case args of
    n:_ -> genExamples (read n)
    _ -> do hPutStrLn stderr "usage: runghc GenExamples.hs <num-examples>"
            exitWith $ ExitFailure 1

genExamples :: Int -> IO ()
genExamples nExamples = do
  putStrLn ":l Foo"
  ireplicateA_ nExamples genExample

genExample :: Int -> IO ()
genExample i = putStr $ unlines
  [ ":m *Foo"
  , "example : \"expr" ++ show i ++ "\""
  , "let foo = it"
  , "\"marker\""
  , "let it = foo"
  ]

ireplicateA_ :: Applicative m => Int -> (Int -> m a) -> m ()
ireplicateA_ cnt0 f =
    loop cnt0 0
  where
    loop cnt n
        | cnt <= 0  = pure ()
        | otherwise = f n *> (loop (cnt - 1) $! (n + 1))

You'll also need this file:

-- Foo.hs
module Foo where

example :: Char
example = 'a'

First, use GenExample to generate a GHCi script:

$ runghc GenExample.hs 500 > Example.script

Now you can run the script like so (using GHCi 8.0.2 as an example):

$ /opt/ghc/8.0.2/bin/ghci -ghci-script Example.script Foo.hs

With GHCi 8.0.2, this takes about three seconds. But with GHCi 8.2.1, this takes about 35 seconds!

Attachments (2)

with-m.prof (198.5 KB) - added by phadej 17 months ago.
without-m.prof (198.3 KB) - added by phadej 17 months ago.

Download all attachments as: .zip

Change History (27)

comment:1 Changed 17 months ago by RyanGlScott

Note that in GenExample, the crucial bit is here:

genExample :: Int -> IO ()
genExample i = putStr $ unlines
  [ ":m *Foo"
  , "example : \"expr" ++ show i ++ "\""
  , "let foo = it"
  , "\"marker\""
  , "let it = foo"
  ]

Both the :m *Foo part, as well as let foo = it/let it = foo, are crucial to triggering the slowdown. If one of them is commented out, then GHCi 8.2.1 achieves parity in speed with 8.0.2.

comment:2 Changed 17 months ago by RyanGlScott

Description: modified (diff)

comment:3 Changed 17 months ago by RyanGlScott

Cc: mniip added
Milestone: 8.4.1

Commit 59d7ee53906b9cee7f279c1f9567af7b930f8636 (GHCi: Don't remove shadowed bindings from typechecker scope.) caused this regression.

comment:4 Changed 17 months ago by RyanGlScott

To make things worse, this script takes even longer between GHC 8.2.1 and HEAD. As noted above, on 8.2.1 it takes about 35 seconds, but on GHC HEAD (as of commit d08b9ccdf2812e8f8fa34d0c89275deee574524c), it takes about 2 minutes 32 seconds. I still need to investigate to which commit(s) were responsible for this further regression.

comment:5 Changed 17 months ago by simonpj

There must be a quadratic algorithm in here! Might someone dig a big deeper? Probably profiling would nail it.

comment:6 Changed 17 months ago by phadej

In GHC-8.2.1 and GHC-8.0.2 maximum residency is the same, but 8.2.1 makes GC work hard. I don't have a profiled build at my hand right now to see what's is so much created there

GHC 8.2.1

*Foo> :q
Leaving GHCi.
 117,933,393,408 bytes allocated in the heap
   8,146,416,328 bytes copied during GC
      19,708,480 bytes maximum residency (254 sample(s))
         281,520 bytes maximum slop
              50 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     30416 colls,     0 par   14.994s  14.949s     0.0005s    0.0330s
  Gen  1       254 colls,     0 par    0.134s   0.137s     0.0005s    0.0033s

  TASKS: 5 (1 bound, 4 peak workers (4 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.002s  (  0.001s elapsed)
  MUT     time   36.655s  ( 39.458s elapsed)
  GC      time   15.128s  ( 15.085s elapsed)
  EXIT    time    0.036s  (  0.046s elapsed)
  Total   time   51.820s  ( 54.591s elapsed)

  Alloc rate    3,217,432,136 bytes per MUT second

  Productivity  70.8% of total user, 72.4% of total elapsed

GHC 8.0.2

*Foo> :q
Leaving GHCi.
   2,025,372,528 bytes allocated in the heap
      67,188,040 bytes copied during GC
      19,710,096 bytes maximum residency (8 sample(s))
         480,376 bytes maximum slop
              44 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       612 colls,     0 par    0.161s   0.160s     0.0003s    0.0023s
  Gen  1         8 colls,     0 par    0.149s   0.150s     0.0187s    0.0415s

  TASKS: 5 (1 bound, 4 peak workers (4 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.001s  (  0.001s elapsed)
  MUT     time    3.144s  (  8.657s elapsed)
  GC      time    0.310s  (  0.310s elapsed)
  EXIT    time    0.042s  (  0.042s elapsed)
  Total   time    3.546s  (  9.010s elapsed)
Last edited 17 months ago by phadej (previous) (diff)

Changed 17 months ago by phadej

Attachment: with-m.prof added

Changed 17 months ago by phadej

Attachment: without-m.prof added

comment:7 Changed 17 months ago by phadej

I added two time profiles, running the variants of ryan's test-case, with and without ":m *Foo".

They look similar, but in bad behaviour withCleanupSession does something, which takes half of the time.

                                                                   individual      inherited
COST CENTRE                        MODULE      no.      entries  %time %alloc  


WITH :m *Foo, BAD behaviour

   withCleanupSession              GHC         1501          1   50.2   90.2    99.9   99.9
    hscStmtWithLocation            HscMain     2406       1201    0.0    0.0    16.5    9.4
     hscParseStmtWithLocation      HscMain     2407       1201    0.0    0.0     0.1    0.1
      hscParseThingWithLocation    HscMain     2408       1201    0.0    0.0     0.1    0.1
       Parser                      HscMain     2409       1201    0.1    0.1     0.1    0.1
        logWarningsReportErrors    HscMain     2410       1201    0.0    0.0     0.0    0.0
         logWarnings               HscMain     2411       1201    0.0    0.0     0.0    0.0
     hscParsedStmt                 HscMain     2412       1201    1.4    1.7    16.4    9.3
      ioMsgMaybe                   HscMain     2413       2402    8.7    4.3    10.9    5.2
       Digraph.topSort             Digraph     2414       7508    0.0    0.0     0.0    0.0
       solveSimples                TcInteract  2424       6307    0.1    0.0     2.0    0.9
        solve_loop                 TcInteract  2426          0    1.5    0.7     1.9    0.9
         canEvVar                  TcCanonical 2427       8108    0.4    0.2     0.4    0.2
         canClass                  TcCanonical 2428        901    0.0    0.0     0.0    0.0

WITHOUT :m *Foo, GOOD behaviour

   withCleanupSession              GHC         1501          1    6.4   16.8    99.8   99.3
    hscStmtWithLocation            HscMain     2406       1201    0.0    0.0    30.9   81.2
     hscParseStmtWithLocation      HscMain     2407       1201    0.0    0.0     0.3    1.0
      hscParseThingWithLocation    HscMain     2408       1201    0.0    0.0     0.3    1.0
       Parser                      HscMain     2409       1201    0.3    0.9     0.3    0.9
        logWarningsReportErrors    HscMain     2410       1201    0.0    0.0     0.0    0.0
         logWarnings               HscMain     2411       1201    0.0    0.0     0.0    0.0
     hscParsedStmt                 HscMain     2412       1201    2.8   14.4    30.5   80.2
      ioMsgMaybe                   HscMain     2413       2402   16.3   37.2    20.2   45.2
       Digraph.topSort             Digraph     2414       7508    0.0    0.1     0.0    0.1
       solveSimples                TcInteract  2424       6307    0.2    0.1     3.6    7.6
        solve_loop                 TcInteract  2426          0    2.7    5.7     3.4    7.5
         canEvVar                  TcCanonical 2427       8108    0.8    1.7     0.8    1.7
         canClass                  TcCanonical 2428        901    0.0    0.1     0.0    0.1

comment:8 Changed 17 months ago by phadej

when module is loaded into context with :m we InteractiveEval.setContext which builds ic_rn_gbl_env with

!final_rdr_env = all_env `icExtendGblRdrEnv` ic_tythings old_ic

Because there aren't shadowing cleanup anymore, ic_tythings grows, icExtendGblRdrEnv is linear in ic_tythings size, which causes the quadratic behaviour when we repeadetly :m modules.

So it isn't related to this bug. E.g. the GenExample.hs version below has similar slowdown:

module Main where

import Control.Monad

import System.Environment
import System.Exit
import System.IO

main :: IO ()
main = do
  args <- getArgs
  case args of
    n:_ -> genExamples (read n)
    _ -> do hPutStrLn stderr "usage: runghc GenExamples.hs <num-examples>"
            exitWith $ ExitFailure 1

genExamples :: Int -> IO ()
genExamples nExamples = do
  putStrLn ":set +s"
  putStrLn ":l Foo"
  putStrLn "let bar = ()" -- first one
  sequence_ [genExample i | i <- [1..nExamples] ]

genExample :: Int -> IO ()
genExample i = putStr $ unlines
  [ ""
  , ":m *Foo"
  , "let foo = bar" -- ask old
  , "let bar = ()"  -- reset, if `let bar = foo` is even slower
  ]

comment:9 Changed 17 months ago by phadej

--- a/compiler/main/InteractiveEval.hs
+++ b/compiler/main/InteractiveEval.hs
@@ -655,7 +655,7 @@ setContext imports
                liftIO $ throwGhcExceptionIO (formatError dflags mod err)
            Right all_env -> do {
        ; let old_ic         = hsc_IC hsc_env
-             !final_rdr_env = all_env `icExtendGblRdrEnv` ic_tythings old_ic
+             !final_rdr_env = {-# SCC "final_rdr_env" #-} all_env `icExtendGblRdrEnv` ic_tythings old_ic
        ; setSession
          hsc_env{ hsc_IC = old_ic { ic_imports    = imports
                                   , ic_rn_gbl_env = final_rdr_env }}}}

makes

COST CENTRE         MODULE          SRC                                          %time %alloc

final_rdr_env       InteractiveEval compiler/main/InteractiveEval.hs:660:59-104   48.0   91.3

comment:10 Changed 17 months ago by simonpj

Thanks for characterising this so well. Here are some thoughts.

  • The GlobalRdrEnv maps an OccName to a [GlobalRdrElt]. This was designed to handle the case where there are a handful of things A.f, B.f, etc. in scope, all with unqualified name f. The list is not expected to be long.
  • But in this case, I believe that that the list is getting long; hence the problem. Adding a DEBUG warning for this situation would be good.
  • Why is the list getting long? Becuase we have
      ghci> let x = True    -- Binds Ghci1.x
      ghci> let x = False   -- Binds Ghci2.x
      ghci> let x = True    -- Binds GHci3.x
      ...etc...
    
    All those Ids are (rightly) kept in the ic_tythings. But they are also all kept in the ic_rn_gbl_env. (Note [The interactive package] in HscTypes and the following notes are of some help.)
  • I can't work out why we keep the shadowed x's in the ic_rn_gbl_env. If we simply deleted them, all would be well. After all, we do not expect the user to be able to refer to an old x with a qualified name Ghci1.x.
  • But consider this:
      ghci> :load M    -- Brings `x` and `M.x` into scope
      ghci> x
      ghci> "Hello"
      ghci> M.x
      ghci> "hello"
      ghci> let x = True  -- Shadows `x`
      ghci> x             -- The locally bound `x`
                          -- NOT an ambiguous reference
      ghci> True
      ghci> M.x           -- M.x is still in scope!
      ghci> "Hello"
    
    So when we add x = True we must not delete the M.x from the GlobalRdrEnv; rather we just want to make it "qualified only"; hence the mk_fake-imp_spec in shadowName.
  • Side note: this is similar to the Template Haskell case, described in Note [GlobalRdrEnv shadowing] in RdrName.
          module M where
            f x = h [d| f = ....f....M.f... |]
    
    In the declaration quote, the unqualified f should refer to the f bound by the quote, but the qualified M.f should refer to the top-level f. So we don't want to delete that top-level binding from the GlobalRdrEnv; we just want to make it "qualified only"; hence the mk_fake-imp_spec in shadowName.

My conclusion: we want a way to delete from the GlobalRdrEnv things bound by GHCi (like Ghci1.x), which we do not want to refer to in a qualified way. How can we distinguish Ghci1.x from M.x? Two possibilities

  • Easy but a bit hacky: we can look at the package in the Name.
  • In some ways nicer: use the is_qual field of the ImpDeclSpec. Currently it's a Bool:
    • True => M.x in in scope, but x is not
    • False => Both M.x and x are in scope

We could provide a third possibility, to say that x is in scope but M.x is not. We could use that for GHCi-bound Ids.

comment:11 Changed 16 months ago by simonmar

Cc: simonmar added

comment:12 Changed 16 months ago by Simon Peyton Jones <simonpj@…>

In 6257fb5/ghc:

Comments about GlobalRdrEnv shadowing

Provoked by Trac #14052

comment:13 Changed 10 months ago by bgamari

Milestone: 8.4.18.4.2

comment:14 Changed 9 months ago by osa1

Owner: set to osa1

I think regardless of the performance problems, #11547 (Phab:D2447) should just be reverted. I think there were some misunderstanding in the ticket, and some questions are left unanswered in Phab:D2447, and I think there's really no utility of this patch. While I agree that having ~1000 shadowed names take ~40s to load is a problem, I also think that keeping shadowed variables is unnecessary.

Here are some facts:

  • Simon says in comment:1 in #11547 that we should be consistent in shadowing. I think we were already consistent previously. Values are shadowed, types are also shadowed, but shadowed types are still accessible in the promopt. His example:
> data A = A
> let f A = Int
> data A = XX
> :type f

already worked on GHC 8.0.1:

$ ghci
GHCi, version 8.0.1: http://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /home/omer/rcbackup/.ghci
λ:1> data A = A
λ:2> let f A = 123
λ:3> data A = XX
λ:4> :t f
f :: Num t => Ghci1.A -> t
λ:5> :info Ghci1.A
data Ghci1.A = A        -- Defined at <interactive>:1:1

The question

But what is the user-facing specification? We need user-manual stuff explaining what all this Ghci4.foo stuff means. How would you know whether to say :t Ghci2.foo or :t Ghci3.foo? Can you list all the foo functions? Etc.

is left unanswered. Ben also asks about the specification in Phab:D2447, and that also goes unanswered, but somehow the patch gets merged later on.

We should at least have a motivating example, otherwise #11547 can also be fixed with a better error message and that'd cost us nothing in terms of performance and implementation simplicity and gives us the same benefits.

After this ideas in comment:10 can still be implemented as an improvement (I'm still digesting that comment).

Simon, in this sentence:

I can't work out why we keep the shadowed x's in the ic_rn_gbl_env. If we simply deleted them, all would be well. After all, we do not expect the user to be able to refer to an old x with a qualified name Ghci1.x.

Why do you think we don't expect user to be able to refer shadowed x with qualified name? That was the motivation for #11547 and Phab:D2447.

Secondly, the example in comment:10 worked fine with GHC 8.0.1:

$ ghci M.hs
GHCi, version 8.0.1: http://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /home/omer/rcbackup/.ghci
[1 of 1] Compiling M                ( M.hs, interpreted )
Ok, modules loaded: M.
λ:1> import M
λ:2> M.x
0
λ:3> x
0
λ:4> let x = 1
λ:5> x
1
λ:6> M.x
0

so there were really no problems that Phab:D2447 solved. I'll again claim that there's no utility of that patch and it should be reverted.

I'll add a perf test for this ticket, and then try to digest comment:10.

comment:15 Changed 9 months ago by Ömer Sinan Ağacan <omeragacan@…>

In 40c4313/ghc:

Add perf test for #14052

comment:16 Changed 9 months ago by osa1

Here's an even simpler reproducer: just repeat this line a few thousand times in a .script file:

let x = 1 :: Int

Then run

$ time (echo ":quit" | ghci -ghci-script Example.script Foo.hs >/dev/null 2>&1)

Results:

GHC 8.0.2 2000 repetitions:  1,04s
GHC 8.0.2 4000 repetitions:  2,06s
GHC 8.0.2 8000 repetitions:  4,02s

GHC 8.2.2 2000 repetitions:  2,17s
GHC 8.2.2 4000 repetitions:  6,31s
GHC 8.2.2 8000 repetitions: 21,39s

Demonstrates that adding a new shadowing binding is a constant time operation in GHC 8.0.2, but it's not constant time in GHC 8.2.2.

comment:17 Changed 9 months ago by simonpj

I think regardless of the performance problems, #11547 (​Phab:D2447) should just be reverted.

Reviewing this, I agree with you. I see that in my original review of Phab:D2447 I wrote "I don't think I fully understand all the consequences, but I don't want to stand in the way". This ticket shows some bad consequences. And I don't think anyone is actively arguing for access to previously in-scope versions of x.

It'd be polite to tell the original author of the patch (mniip I believe), in case he has reasons for wanting (some revised version of) his patch.

comment:18 Changed 9 months ago by osa1

Differential Rev(s): Phab:D4478
Status: newpatch

comment:19 Changed 9 months ago by osa1

I told mniip about this ticket on IRC.

comment:20 Changed 9 months ago by mniip

I don't hold any strong opinion on how the shadowed identifiers should behave, but whichever way it is, I don't think an error similar to one in #11547 is appropriate. Perhaps it is best to hide the Ghci# modules in renamer too?

comment:21 Changed 9 months ago by Ömer Sinan Ağacan <omeragacan@…>

In 98c7749/ghc:

Revert "GHCi: Don't remove shadowed bindings from typechecker scope."

This reverts commit 59d7ee53906b9cee7f279c1f9567af7b930f8636 and enables
the test for #14052.

(See #14052 for the discussion)

Reviewers: bgamari

Reviewed By: bgamari

Subscribers: rwbarton, thomie, carter

GHC Trac Issues: #14052

Differential Revision: https://phabricator.haskell.org/D4478

comment:22 Changed 9 months ago by osa1

Status: patchmerge

Agreed that the error message is bad. We should probably reopen #11547 or open a new ticket to improve it.

comment:23 Changed 9 months ago by bgamari

Resolution: fixed
Status: mergeclosed

Revert merged to ghc-8.4 with fbc4ebafb052d0905543ca821801e77b86384db1.

Last edited 9 months ago by bgamari (previous) (diff)

comment:24 Changed 9 months ago by sighingnow

Revert Phab:D2447 made the ticket:11547 existed in HEAD again. See also ticket:14996#comment:2.

Perhaps we should implement the conclusion posted in comment:10 ? If so, I'm volunteer to give that a try.

comment:25 Changed 9 months ago by osa1

The conclusion was to not keep shadowed variables at all (because there's no reason for it), whether it's fast or slow. So really the fix for #11547 is just a better error message (as mentioned in comment:22).

sighingnow, would you be willing to open a new ticket for that and give it a try?

Note: See TracTickets for help on using tickets.