Register spilling produces ineffecient/highly contending code

The native codegen and llvm both produce ineffecient code for functions using structures with many strict fields or unboxed values.

Consider the following program:

{-# LANGUAGE BangPatterns #-}
module Spill where

import GHC.Exts

data S = S !Int !Int !Int !Int !Int !Int !Int !Int !Int

spill :: S -> S -> S -> S
spill (S !a !b !c !d !e !f !g !h !i) (S !j !k !l !m !n !o !p !q !r) (S !s !t !u !v !w !x !y !z _)
 = S (a + j + s) (b + c) (k + r) (a + b + c + d + e + f + g + h + i) (j + k + l + m + n + o + p + q + r) (s + t + u + v + w + x + y + z) (a + b + c) (j + k + l) (s + t + u)

Parts of the code produced for this (which is identical regardless of -funbox-strict-fields) looks like:

_cnc:
    addq $80,%r12
    cmpq 144(%r13),%r12
    ja _cni
    movq $Spill.S_con_info,-72(%r12)
    movq 32(%rbp),%rax
    movq %rax,-64(%r12)
    movq 24(%rbp),%rax
    movq %rax,-56(%r12)
    movq 16(%rbp),%rax
    movq %rax,-48(%r12)
    movq 8(%rbp),%rax
    movq %rax,-40(%r12)
    movq 40(%rbp),%rax
    movq %rax,-32(%r12)
    movq 48(%rbp),%rax
    movq %rax,-24(%r12)
    movq 56(%rbp),%rax
    movq %rax,-16(%r12)
    movq 64(%rbp),%rax
    movq %rax,-8(%r12)
    movq 7(%rbx),%rax
    movq %rax,0(%r12)
    leaq -71(%r12),%rbx
    addq $72,%rbp
    jmp *0(%rbp)

_csv:
    movq 63(%rbx),%rax
    movq %rax,-56(%rbp)
    movq 55(%rbx),%rax
    movq %rax,-48(%rbp)
    movq 47(%rbx),%rax
    movq %rax,-40(%rbp)
    movq 39(%rbx),%rax
    movq %rax,-32(%rbp)
    movq 31(%rbx),%rax
    movq %rax,-24(%rbp)
    movq 23(%rbx),%rax
    movq %rax,-16(%rbp)
    movq 71(%rbx),%rax
    movq %rax,-8(%rbp)
    movq 15(%rbx),%rax
    movq %rax,0(%rbp)

And likewise for LLVM:

.LBB10_1: # %coZ
        movq    7(%rbx), %rcx
        movq    $Spill_S_con_info, 8(%rax)
        movq    8(%rbp), %rdx
        movq    %rdx, 16(%rax)
        movq    16(%rbp), %rdx
        movq    %rdx, 24(%rax)
        movq    24(%rbp), %rdx
        movq    %rdx, 32(%rax)
        movq    32(%rbp), %rdx
        movq    %rdx, 40(%rax)
        movq    40(%rbp), %rdx
        movq    %rdx, 48(%rax)
        movq    48(%rbp), %rdx
        movq    %rdx, 56(%rax)
        movq    56(%rbp), %rdx
        movq    %rdx, 64(%rax)
        movq    64(%rbp), %rdx
        movq    %rdx, 72(%rax)
        movq    %rcx, (%r12)
        movq    72(%rbp), %rax
        leaq    72(%rbp), %rbp
        leaq    -71(%r12), %rbx
        jmpq    *%rax  # TAILCALL

Quoting from #ghc "the [register allocator] core algo is '96 vintage". Improvements are needed;

Take into consideration pipelining and handle spills less dramatically, attempting to reduce register contention
Sink memory reads in order to reduce register pressure

Trac metadata

Trac field	Value
Version	7.6.3
Type	Bug
TypeOfFailure	OtherFailure
Priority	normal
Resolution	Unresolved
Component	Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information