#8835 closed bug (duplicate)

7.6.3 vs 7.8-RC performance regression

Reported by: cbm80 Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.8.1-rc2
Keywords: Cc:
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: #8814 Differential Revisions:

Description

NOTE: code for this is on github https://github.com/cbm80/Minimal

I have a program that uses xml-conduit to parse nzb files, a file format commonly found on Usenet.

Running the program

./mini demo1.nzb +RTS -sstderr

compiled with GHC 7.6.3 yields:

    37,827,736 bytes allocated in the heap
       703,392 bytes copied during GC
       172,040 bytes maximum residency (2 sample(s))
        35,024 bytes maximum slop
             2 MB total memory in use (0 MB lost due to fragmentation)

                                  Tot time (elapsed)  Avg pause  Max pause
    Gen  0        71 colls,     0 par    0.00s    0.00s     0.0000s    0.0001s
    Gen  1         2 colls,     0 par    0.00s    0.00s     0.0003s    0.0005s

    TASKS: 3 (1 bound, 2 peak workers (2 total), using -N1)

    SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

    INIT    time    0.00s  (  0.00s elapsed)
    MUT     time    0.01s  (  0.01s elapsed)
    GC      time    0.00s  (  0.00s elapsed)
    EXIT    time    0.00s  (  0.00s elapsed)
    Total   time    0.02s  (  0.02s elapsed)

    Alloc rate    2,603,976,578 bytes per MUT second

    Productivity  84.1% of total user, 86.9% of total elapsed

The same program, same libraries but compiled with GHC 7.8.0.20140228 gives:

    11,217,541,448 bytes allocated in the heap
        31,184,656 bytes copied during GC
         6,540,288 bytes maximum residency (178 sample(s))
           597,248 bytes maximum slop
                19 MB total memory in use (5 MB lost due to fragmentation)

                                      Tot time (elapsed)  Avg pause  Max pause
    Gen  0     20623 colls,     0 par    0.50s    0.41s     0.0000s    0.0008s
    Gen  1       178 colls,     0 par    0.04s    0.05s     0.0003s    0.0009s'''

    TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

    SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

    INIT    time    0.00s  (  0.00s elapsed)
    MUT     time    3.41s  (  3.50s elapsed)
    GC      time    0.55s  (  0.45s elapsed)
    EXIT    time    0.00s  (  0.00s elapsed)
    Total   time    3.96s  (  3.96s elapsed)

    Alloc rate    3,293,236,224 bytes per MUT second

    Productivity  86.1% of total user, 86.1% of total elapsed

While total memory usage always stays around 2 to 3 mb when compiled with GHC 7.6.3, even if using bigger nzb files (try demo2.nzb), the 7.8-RC compiled program eats huge amounts of memory and becomes very slow. Using +RTS -S shows even more worrying GC differences.

Unfortunately I can't tell whether the culprit is in one of the libraries or in the compiler.

GHC 7.8-RC used was ghc-7.8.0.20140228-x86_64-unknown-linux-deb7.tar.xz from http://www.haskell.org/ghc/dist/7.8.1-rc2

Change History (9)

comment:1 Changed 15 months ago by carter

could you compile the program with 7.8 using -fno-full-laziness and give us the perf info then? (theres 1-2 example programs we've seen where adding -fno-full-laziness will resolve the change in performance characteristics)

comment:2 Changed 15 months ago by thoughtpolice

-fno-full-laziness makes exactly 0 difference on this program. I can see some of the differences in Core, but I haven't made out exactly what the problem is yet.

comment:3 Changed 15 months ago by cbm80

I have simplified and smallified the code on github a bit, and compiled with -fno-full-laziness. While memory usage is roughly down to the 7.6.3 version the GC problems remained.

Last edited 15 months ago by cbm80 (previous) (diff)

comment:4 Changed 15 months ago by cbm80

  • Resolution set to fixed
  • Status changed from new to closed

The bug was in attoparsec and has been fixed in the latest release 0.11.2.1

For reference see:

https://github.com/bos/attoparsec/issues/56

comment:5 Changed 15 months ago by tibbe

  • Resolution fixed deleted
  • Status changed from closed to new

I don't think it's a bug in attoparsec. bos did add a workaround (removing an INLINE), but I think it's a bug in the GHC. I don't expect a order-of-magnitude performance regression in a release.

comment:6 Changed 15 months ago by cbm80

I agree that it is something in the GHC. This bug and #8814 are the same however, and should be merged.

comment:7 Changed 15 months ago by simonpj

Are you sure that this bug is the same as #8814? When I look at the git repo in the Description at the top, the cabal file doesn't mention attoparsec.

Can anyone take the time to find out why the performance differs so much, and to boil out a test case?

Simon

comment:8 Changed 15 months ago by cbm80

In my code I'm using xml-conduit for parsing, which has a dependency on attoparsec. Also the workaround bos made fixed not only the GC behavior, but also memory consumption is again what is was in 7.6.3. So yes, I think it is indeed the same bug.

comment:9 Changed 15 months ago by simonpj

  • Resolution set to duplicate
  • Status changed from new to closed

Fine. I'll close this as a dup of #8814. Sadly, the latter still needs someone to dig into it.

Simon

Note: See TracTickets for help on using tickets.