Changes between Version 16 and Version 17 of PrimBool


Ignore:
Timestamp:
Jun 12, 2013 10:11:06 AM (23 months ago)
Author:
jstolarek
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • PrimBool

    v16 v17  
    155155}}} 
    156156 
    157 Using the LLVM backend this compiles to: 
    158  
    159 {{{ 
    160 # BB#0:                                 # %c1oe 
    161   movq   %rsi, %rax 
    162   orq    %r14, %rax 
    163   shrq   $63, %rax 
    164   cmpq   %rdi, %r14 
    165   setge  %cl 
    166   movzbl %cl, %ecx 
    167   orq    %rax, %rcx 
    168   cmpq   %r8, %rsi 
    169   setge  %al 
    170   movzbl %al, %eax 
    171   orq    %rcx, %rax 
    172   jne    .LBB2_1 
    173 # BB#3:                                 # %c1oP 
    174   movq   (%rbp), %rax 
    175   movl   $r1mu_closure+1, %ebx 
    176   jmpq   *%rax  # TAILCALL 
    177 .LBB2_1:                                # %c1oe 
    178   cmpq   $1, %rax 
    179   jne    .LBB2_2 
    180 # BB#4:                                 # %c1oZ 
    181   movq   (%rbp), %rax 
    182   movl   $r1mv_closure+1, %ebx 
    183   jmpq   *%rax  # TAILCALL 
    184 .LBB2_2:                                # %c1oF 
    185   movq   r1mt_closure(%rip), %rax 
    186   movl   $r1mt_closure, %ebx 
    187   jmpq   *%rax  # TAILCALL 
    188  
    189 }}} 
    190  
    191 The assembly does not contain comparisons and branches in the scrutinee of the case expression, but still uses jumps to select an appropriate branch of the case expression. 
     157Let's analyze how that code gets compiled by the LLVM backend. For purposes of this analysis I will convert the above case expression to the following function: 
     158  
     159{{{ 
     160f :: Int# -> Int# -> Int# -> Int# -> String 
     161f x y width height = 
     162    case (x <$# 0#) `orI#` (x >=$# width) `orI#` (y <$# 0#) `orI#` (y >=$# height) of 
     163      1# -> "one" 
     164      0# -> "zero" 
     165}}} 
     166 
     167By dumping intermediate Cmm representation we can determine how variables are mapped to CPU registers. Arguments are passed to the function using a stack: 
     168 
     169{{{ 
     170Main.f_slow() //  [R1] 
     171         { info_tbl: [] 
     172           stack_info: arg_space: 0 updfr_space: Nothing 
     173         } 
     174     {offset 
     175       cWY: 
     176           R5 = I64[Sp + 24]; 
     177           R4 = I64[Sp + 16]; 
     178           R3 = I64[Sp + 8]; 
     179           R2 = I64[Sp]; 
     180           R1 = R1; 
     181           Sp = Sp + 32; 
     182           call Main.f_info(R5, R4, R3, R2, R1) args: 8, res: 0, upd: 8; 
     183     } 
     184 } 
     185}}} 
     186`R2` contains the first argument `x`, `R3` contains `y`, `R4` contains `width` and `R5` contains `height`. We can verify that by looking at the body of Main.f_info: 
     187 
     188{{{ 
     189_sV9::I64 = R3; 
     190_sV3::I64 = R2; 
     191_sWx::I64 = %MO_S_Lt_W64(_sV3::I64, 0) | %MO_S_Ge_W64(_sV3::I64, R4) | 
     192            %MO_S_Lt_W64(_sV9::I64, 0) | %MO_S_Ge_W64(_sV9::I64, R5); 
     193}}} 
     194 
     195 
     196Mappings between Cmm's R(2/3/4/5) registers and machine registers are defined in includes/stg/MachRegs.h: 
     197 
     198{{{ 
     199#define REG_R2 r14 
     200#define REG_R3 rsi 
     201#define REG_R4 rdi 
     202#define REG_R5 r8 
     203}}} 
     204 
     205Knowing that we can dump the assembly generated  by LLVM backend: 
     206 
     207{{{ 
     208Main_f_info: 
     209# BB#0: 
     210   movq    %rsi, %rax 
     211   orq     %r14, %rax 
     212   shrq    $63, %rax 
     213   cmpq    %rdi, %r14 
     214   setge   %cl 
     215   movzbl  %cl, %ecx 
     216   orq     %rax, %rcx 
     217   cmpq    %r8, %rsi 
     218   setge   %al 
     219   movzbl  %al, %eax 
     220   orq     %rcx, %rax 
     221   jne     .LBB4_1 
     222# BB#3: 
     223   movq    Main_f2_closure(%rip), %rax 
     224   movl    $Main_f2_closure, %ebx 
     225   jmpq    *%rax  # TAILCALL 
     226.LBB4_1: 
     227   cmpq    $1, %rax 
     228   jne     .LBB4_2 
     229# BB#4: 
     230   movq    Main_f1_closure(%rip), %rax 
     231   movl    $Main_f1_closure, %ebx 
     232   jmpq    *%rax  # TAILCALL 
     233.LBB4_2: 
     234   movq    Main_f3_closure(%rip), %rax 
     235   movl    $Main_f3_closure, %ebx 
     236   jmpq    *%rax  # TAILCALL 
     237}}} 
     238 
     239Let's analyze line by line the part responsible for evaluating the scrutinee: 
     240 
     241{{{ 
     242  movq   %rsi, %rax   # load y (stored in %rsi) into %rax register 
     243  orq    %r14, %rax   # perform bitwise OR between y (now in %rax) and x (in %r14) 
     244                      # and store result in %rax 
     245  shrq   $63, %rax    # shift %rax 63 bits to the right. If both x and y were 
     246                      # positive numbers, then their sign bits (MSB) were set to 
     247                      # 0 and so %rax is now 0. If at least one of them was 
     248                      # negative then its sign bit must have been 1 and so %rax 
     249                      # is now 1 
     250  cmpq   %rdi, %r14   # subtract width (in %rdi) from x (in %r14), discard the 
     251                      # result and set the flags. If width was greater than x 
     252                      # then SF is set to 1 and OF is set to 0. This is tricky 
     253                      # because combination of SF and OF is needed to decide 
     254                      # which argument is actually equal  
     255  setge  %cl          # if OF=SF it means that x was greater or equal to width 
     256                      # and we note that fact by setting LSB in %cl to 1 
     257  movzbl %cl, %ecx    # zero the bits of %ecx register except the lowest 8 bits 
     258                      # containg the result of previous operation 
     259  orq    %rax, %rcx   # perform logical OR on results of the previous test and 
     260                      # store the result in %rcx. At this point if %rcx is 1 
     261                      # then either x was negative, or y was negative or 
     262                      # x was greater or equal to width 
     263  cmpq   %r8, %rsi    # now we are checking whether y is greter or equal to 
     264                      # height. This is the same as previosly for x and width. 
     265  setge  %al          # This time we set LSB of %al to 1 if y >= height 
     266  movzbl %al, %eax    # as previously, clear bits of %eax except lowest 8 bits 
     267  orq    %rcx, %rax   # perform logical OR which combines result of previous 
     268                      # three comparisons with the last one     
     269  jne    .LBB2_1      # if ZF is not set it this means that either %rcx or %rax 
     270                      # was not zero, which means that at least one condition 
     271                      # in the scrutinee was true 
     272}}} 
     273 
     274The assembly does not contain comparisons and branches in the scrutinee of the case expression, but still uses jumps to select an appropriate branch of the case expression.  
     275 
    192276 
    193277=== Benchmarks ===