Changes between Version 16 and Version 17 of PrimBool


Ignore:
Timestamp:
Jun 12, 2013 10:11:06 AM (2 years ago)
Author:
jstolarek
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • PrimBool

    v16 v17  
    155155}}}
    156156
    157 Using the LLVM backend this compiles to:
    158 
    159 {{{
    160 # BB#0:                                 # %c1oe
    161   movq   %rsi, %rax
    162   orq    %r14, %rax
    163   shrq   $63, %rax
    164   cmpq   %rdi, %r14
    165   setge  %cl
    166   movzbl %cl, %ecx
    167   orq    %rax, %rcx
    168   cmpq   %r8, %rsi
    169   setge  %al
    170   movzbl %al, %eax
    171   orq    %rcx, %rax
    172   jne    .LBB2_1
    173 # BB#3:                                 # %c1oP
    174   movq   (%rbp), %rax
    175   movl   $r1mu_closure+1, %ebx
    176   jmpq   *%rax  # TAILCALL
    177 .LBB2_1:                                # %c1oe
    178   cmpq   $1, %rax
    179   jne    .LBB2_2
    180 # BB#4:                                 # %c1oZ
    181   movq   (%rbp), %rax
    182   movl   $r1mv_closure+1, %ebx
    183   jmpq   *%rax  # TAILCALL
    184 .LBB2_2:                                # %c1oF
    185   movq   r1mt_closure(%rip), %rax
    186   movl   $r1mt_closure, %ebx
    187   jmpq   *%rax  # TAILCALL
    188 
    189 }}}
    190 
    191 The assembly does not contain comparisons and branches in the scrutinee of the case expression, but still uses jumps to select an appropriate branch of the case expression.
     157Let's analyze how that code gets compiled by the LLVM backend. For purposes of this analysis I will convert the above case expression to the following function:
     158 
     159{{{
     160f :: Int# -> Int# -> Int# -> Int# -> String
     161f x y width height =
     162    case (x <$# 0#) `orI#` (x >=$# width) `orI#` (y <$# 0#) `orI#` (y >=$# height) of
     163      1# -> "one"
     164      0# -> "zero"
     165}}}
     166
     167By dumping intermediate Cmm representation we can determine how variables are mapped to CPU registers. Arguments are passed to the function using a stack:
     168
     169{{{
     170Main.f_slow() //  [R1]
     171         { info_tbl: []
     172           stack_info: arg_space: 0 updfr_space: Nothing
     173         }
     174     {offset
     175       cWY:
     176           R5 = I64[Sp + 24];
     177           R4 = I64[Sp + 16];
     178           R3 = I64[Sp + 8];
     179           R2 = I64[Sp];
     180           R1 = R1;
     181           Sp = Sp + 32;
     182           call Main.f_info(R5, R4, R3, R2, R1) args: 8, res: 0, upd: 8;
     183     }
     184 }
     185}}}
     186`R2` contains the first argument `x`, `R3` contains `y`, `R4` contains `width` and `R5` contains `height`. We can verify that by looking at the body of Main.f_info:
     187
     188{{{
     189_sV9::I64 = R3;
     190_sV3::I64 = R2;
     191_sWx::I64 = %MO_S_Lt_W64(_sV3::I64, 0) | %MO_S_Ge_W64(_sV3::I64, R4) |
     192            %MO_S_Lt_W64(_sV9::I64, 0) | %MO_S_Ge_W64(_sV9::I64, R5);
     193}}}
     194
     195
     196Mappings between Cmm's R(2/3/4/5) registers and machine registers are defined in includes/stg/MachRegs.h:
     197
     198{{{
     199#define REG_R2 r14
     200#define REG_R3 rsi
     201#define REG_R4 rdi
     202#define REG_R5 r8
     203}}}
     204
     205Knowing that we can dump the assembly generated  by LLVM backend:
     206
     207{{{
     208Main_f_info:
     209# BB#0:
     210   movq    %rsi, %rax
     211   orq     %r14, %rax
     212   shrq    $63, %rax
     213   cmpq    %rdi, %r14
     214   setge   %cl
     215   movzbl  %cl, %ecx
     216   orq     %rax, %rcx
     217   cmpq    %r8, %rsi
     218   setge   %al
     219   movzbl  %al, %eax
     220   orq     %rcx, %rax
     221   jne     .LBB4_1
     222# BB#3:
     223   movq    Main_f2_closure(%rip), %rax
     224   movl    $Main_f2_closure, %ebx
     225   jmpq    *%rax  # TAILCALL
     226.LBB4_1:
     227   cmpq    $1, %rax
     228   jne     .LBB4_2
     229# BB#4:
     230   movq    Main_f1_closure(%rip), %rax
     231   movl    $Main_f1_closure, %ebx
     232   jmpq    *%rax  # TAILCALL
     233.LBB4_2:
     234   movq    Main_f3_closure(%rip), %rax
     235   movl    $Main_f3_closure, %ebx
     236   jmpq    *%rax  # TAILCALL
     237}}}
     238
     239Let's analyze line by line the part responsible for evaluating the scrutinee:
     240
     241{{{
     242  movq   %rsi, %rax   # load y (stored in %rsi) into %rax register
     243  orq    %r14, %rax   # perform bitwise OR between y (now in %rax) and x (in %r14)
     244                      # and store result in %rax
     245  shrq   $63, %rax    # shift %rax 63 bits to the right. If both x and y were
     246                      # positive numbers, then their sign bits (MSB) were set to
     247                      # 0 and so %rax is now 0. If at least one of them was
     248                      # negative then its sign bit must have been 1 and so %rax
     249                      # is now 1
     250  cmpq   %rdi, %r14   # subtract width (in %rdi) from x (in %r14), discard the
     251                      # result and set the flags. If width was greater than x
     252                      # then SF is set to 1 and OF is set to 0. This is tricky
     253                      # because combination of SF and OF is needed to decide
     254                      # which argument is actually equal
     255  setge  %cl          # if OF=SF it means that x was greater or equal to width
     256                      # and we note that fact by setting LSB in %cl to 1
     257  movzbl %cl, %ecx    # zero the bits of %ecx register except the lowest 8 bits
     258                      # containg the result of previous operation
     259  orq    %rax, %rcx   # perform logical OR on results of the previous test and
     260                      # store the result in %rcx. At this point if %rcx is 1
     261                      # then either x was negative, or y was negative or
     262                      # x was greater or equal to width
     263  cmpq   %r8, %rsi    # now we are checking whether y is greter or equal to
     264                      # height. This is the same as previosly for x and width.
     265  setge  %al          # This time we set LSB of %al to 1 if y >= height
     266  movzbl %al, %eax    # as previously, clear bits of %eax except lowest 8 bits
     267  orq    %rcx, %rax   # perform logical OR which combines result of previous
     268                      # three comparisons with the last one   
     269  jne    .LBB2_1      # if ZF is not set it this means that either %rcx or %rax
     270                      # was not zero, which means that at least one condition
     271                      # in the scrutinee was true
     272}}}
     273
     274The assembly does not contain comparisons and branches in the scrutinee of the case expression, but still uses jumps to select an appropriate branch of the case expression.
     275
    192276
    193277=== Benchmarks ===