139 | | == Proof of concept == |
140 | | |
141 | | The prototype patch posted on the trac on 13th of March implemented six new prototype comparison primops: |
142 | | |
143 | | {{{ |
144 | | .># :: Int# -> Int# -> Int# |
145 | | .<# :: Int# -> Int# -> Int# |
146 | | .>=# :: Int# -> Int# -> Int# |
147 | | .<=# :: Int# -> Int# -> Int# |
148 | | .==# :: Int# -> Int# -> Int# |
149 | | ./=# :: Int# -> Int# -> Int# |
150 | | }}} |
151 | | |
152 | | Each of these new primops takes two `Int#`s that are to be compared. The result is also an `Int#`: `0#` if the relation between the operands does not hold and `1#` when it does hold. For example `5# .># 3#` returns `1#` and `3# .># 3#` returns `0#`. With the new primops we can rewrite the original expression that motivated the problem: |
| 139 | == Eliminating branches using new primops == |
| 140 | |
| 141 | With the new primops we can rewrite the original expression that motivated the problem: |
168 | | (Note: `orI#` is a bitwise OR operation on operands of type `Int#`. It was introduced together with `andI#`, `notI#` and `xor#` in #7689). Using the LLVM backend this compiles to: |
169 | | |
170 | | {{{ |
171 | | # BB#0: # %c1nK |
172 | | movq %rsi, %rax |
173 | | orq %r14, %rax |
174 | | shrq $63, %rax |
175 | | cmpq %rdi, %r14 |
176 | | setge %cl |
177 | | movzbl %cl, %ecx |
178 | | orq %rax, %rcx |
179 | | cmpq %r8, %rsi |
180 | | setge %al |
181 | | movzbl %al, %eax |
182 | | orq %rcx, %rax |
183 | | jne .LBB2_1 |
184 | | # BB#3: # %c1ol |
185 | | movq (%rbp), %rax |
186 | | movl $r1m6_closure+1, %ebx |
187 | | jmpq *%rax # TAILCALL |
188 | | .LBB2_1: # %c1nK |
189 | | cmpq $1, %rax |
190 | | jne .LBB2_2 |
191 | | # BB#4: # %c1ov |
192 | | movq (%rbp), %rax |
193 | | movl $r1m7_closure+1, %ebx |
194 | | jmpq *%rax # TAILCALL |
195 | | .LBB2_2: # %c1ob |
196 | | movq r1m5_closure(%rip), %rax |
197 | | movl $r1m5_closure, %ebx |
198 | | jmpq *%rax # TAILCALL |
199 | | }}} |
200 | | |
201 | | The assembly does not contain comparisons and jumps in the scrutinee of the case expression, but still does jumps for selecting an appropriate branch of the case expression. |
202 | | |
203 | | === Benchmarks for the proposed patch === |
204 | | |
205 | | Below is a benchmark for the proof-of-concept filter function that demonstrates performance gains possible with the new primops: |
| 157 | Using the LLVM backend this compiles to: |
| 158 | |
| 159 | {{{ |
| 160 | # BB#0: # %c1oe |
| 161 | movq %rsi, %rax |
| 162 | orq %r14, %rax |
| 163 | shrq $63, %rax |
| 164 | cmpq %rdi, %r14 |
| 165 | setge %cl |
| 166 | movzbl %cl, %ecx |
| 167 | orq %rax, %rcx |
| 168 | cmpq %r8, %rsi |
| 169 | setge %al |
| 170 | movzbl %al, %eax |
| 171 | orq %rcx, %rax |
| 172 | jne .LBB2_1 |
| 173 | # BB#3: # %c1oP |
| 174 | movq (%rbp), %rax |
| 175 | movl $r1mu_closure+1, %ebx |
| 176 | jmpq *%rax # TAILCALL |
| 177 | .LBB2_1: # %c1oe |
| 178 | cmpq $1, %rax |
| 179 | jne .LBB2_2 |
| 180 | # BB#4: # %c1oZ |
| 181 | movq (%rbp), %rax |
| 182 | movl $r1mv_closure+1, %ebx |
| 183 | jmpq *%rax # TAILCALL |
| 184 | .LBB2_2: # %c1oF |
| 185 | movq r1mt_closure(%rip), %rax |
| 186 | movl $r1mt_closure, %ebx |
| 187 | jmpq *%rax # TAILCALL |
| 188 | |
| 189 | }}} |
| 190 | |
| 191 | The assembly does not contain comparisons and branches in the scrutinee of the case expression, but still uses jumps to select an appropriate branch of the case expression. |
| 192 | |
| 193 | === Benchmarks === |
| 194 | |
| 195 | Below is a benchmark for the proof-of-concept branchless filter function that demonstrates performance gains possible with the new primops: |