Opened 3 years ago

Last modified 7 weeks ago

#9157 new bug

cmm common block not eliminated

Reported by: wojteknar Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.8.2
Keywords: CodeGen Cc: jstolarek, simonmar
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:


For the toTuple# function (in the attached file) GHC generates a cmm switch statement, with all the alternatives, unsurprisingly, the same. Yet, cmm common block elimination does not kick in.

In this particular example, the whole case statement could be annihilated, because all alternatives lead to the same code.

BTW, this is how they match in France.

type alt = A of int | B of int | C of int | D of int
let g x = match x with A v | B v  | C v | D v -> v

Attachments (1)

ChunkNotElim.hs (721 bytes) - added by wojteknar 3 years ago.

Download all attachments as: .zip

Change History (12)

Changed 3 years ago by wojteknar

Attachment: ChunkNotElim.hs added

comment:1 Changed 3 years ago by jstolarek

Cc: jan.stolarek@… added

comment:2 Changed 3 years ago by jstolarek

Owner: set to jstolarek

comment:3 Changed 3 years ago by jstolarek

Cc: simonmar added

I confirmed that this bug exists in HEAD and found the reason. I suspect that Common Block Elimination as implemented now might not work as intended. Here's what's happening. Code attached in the bug report generates 8 identical blocks (showing only 2 here for brevity):

    _s1CG::I64 = I64[_s1Cy::P64 + 7];
    _g1CN::I64 = _s1CG::I64;
    goto c1Dd;
    _s1CF::I64 = I64[_s1Cy::P64 + 7];
    _g1CN::I64 = _s1CF::I64;
    goto c1Dd;

We hash these block ignoring the labels and we get identical hashes as expected. Then we get to the point when we attempt to actually eliminate common blocks. We check these two block for equality, except that this time we don't ignore the labels and other unique identifiers (eg. here). We conclude that these two block are different because local registers _s1CG are _s1CF are different. Later the sinking pass makes these two blocks identical. The result that we see in the Cmm output are 8 identical blocks:

    _g1CN::I64 = I64[_s1Cy::P64 + 7];
    goto c1Dd;
    _g1CN::I64 = I64[_s1Cy::P64 + 7];
    goto c1Dd;

I suspect that this is not how CBE was intended to work. I mean if we ignore the labels during hashing then we should probably be smarter when actually comparing the blocks. Except that this is not trivial. I wonder if we could simply move CBE to the end of Cmm pipeline, so that it is run after sinking and other optimizations? Or is there A Good Reason to have CBE at the beginning of the pipeline and not at the end? CCing Simon Marlow - perhaps he can tell.

Last edited 2 years ago by jstolarek (previous) (diff)

comment:4 Changed 3 years ago by simonpj

I suspect that we ignore the labels the first time so that this works:

L1: x = x+1; goto L2
L2: y = y+1; goto Lend

L3: x = x+1; goto L3
L4: y = y+1; goto Lend

Now L1 and L3 would be different if we took account of labels, but once L2 and L4 are commoned up, L1 and L3 can be as well.

Waiting for the sinking pass may well help, and I don't know why we don't do CBE at the end. But it isn't enough in general. Consider

L1: x = r+1; Sp[4] = x; Sp[8] = x*2; goto Lend
L2: y = r+1; Sp[4] = y; Sp[8] = y*2; goto Lend

Here x and y look different but they are simply local temporaries.

If we did a liveness analysis, the equality-comparing function could make-equivalent assignments to dead variables. So when comparing L1 and L2, since x and y are dead, the first statements will compare equal, and make x and y equivalent; hence the second statements will compare equal too.

Anyway, I'm no expert here; I have not looked at the code.

comment:5 Changed 3 years ago by simonmar

Yes, I think Simon PJ's reason above is exactly why hashes don't include labels: so that the hashes remain stable under elimination of common blocks. After eliminating some blocks, more opportunities for elimination emerge, so the algorithm keeps iterating until there are no more blocks to eliminate.

Originally CBE was there to catch a common case of common blocks that occurred in the code we generated for case expressions. But that pattern is handled directly by the code generator (we don't generate the common blocks anymore), and hence CBE is only enabled with -O.

As with most optimisations, CBE interacts with other passes, so there may be more (or fewer) opportunities for CBE at different stages during the pipeline. Whether it's worthwhile (a) having another CBE pass (maybe with -O2 only) or (b) moving the existing CBE pass later in the pipeline, is a matter for measurement.

comment:6 Changed 3 years ago by jstolarek

Whether it's worthwhile (a) having another CBE pass (maybe with -O2 only) or (b) moving the existing CBE pass later in the pipeline, is a matter for measurement.

I see. It looks like a low-priority thing so I'm leaving investigating that to someone who has access to more computing power than me.

comment:7 Changed 3 years ago by jstolarek

Owner: jstolarek deleted

comment:8 Changed 3 years ago by wojteknar

Not generating the identical blocks at all sounds like a good idea to an outsider.

comment:9 Changed 21 months ago by thomie

Type of failure: OtherRuntime performance bug

comment:10 Changed 21 months ago by jstolarek

Cc: jstolarek added; jan.stolarek@… removed

comment:11 Changed 7 weeks ago by simonpj

Keywords: CodeGen added
Note: See TracTickets for help on using tickets.