Changes between Version 1 and Version 2 of SIMDVectorExampleInLLVM


Ignore:
Timestamp:
Oct 25, 2011 5:28:23 PM (2 years ago)
Author:
pmonday
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SIMDVectorExampleInLLVM

    v1 v2  
    77   - Run the byte codes on the JIT compiler:  lli add_floats.bc should run the instructions and produce the result 
    88 
    9 To demonstrate the vector instructions, let's start with a basic LLVM program that adds 4 floats (Single Instruction, Multiple Data (SIMD) will let us add up to 4 floats together in a single instruction).  Doing this without vectorization, the program looks like this: 
     9To demonstrate the vector instructions, we can start with a basic C program (just to illustrate ... remember, LLVM is not functional so starting in an imperative language makes a lot of sense): 
     10{{{ 
     11#include <stdio.h> 
     12#include <stdlib.h> 
     13 
     14int main() { 
     15 float x1, x2, x3, x4, result; 
     16 x1 = 1.0; 
     17 x2 = 2.0; 
     18 x3 = 3.0; 
     19 x4 = 4.0; 
     20 result = x1 + x2 + x3 + x4; 
     21 printf("result = %f\n", result); 
     22} 
     23}}} 
     24 
     25Compiling and running this in C is easy and left to the user. 
     26 
     27This converts easily to LLVM human readable format (use the [http://llvm.org/demo/index.cgi online generator] if you'd like): 
     28{{{ 
     29; ModuleID = '/tmp/webcompile/_14760_0.bc' 
     30target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64" 
     31target triple = "x86_64-unknown-linux-gnu" 
     32 
     33@.str = private unnamed_addr constant [13 x i8] c"result = %f\0A\00" 
     34 
     35define i32 @main() nounwind { 
     36  %1 = alloca i32, align 4 
     37  %x1 = alloca float, align 4 
     38  %x2 = alloca float, align 4 
     39  %x3 = alloca float, align 4 
     40  %x4 = alloca float, align 4 
     41  %result = alloca float, align 4 
     42  store i32 0, i32* %1 
     43  store float 1.000000e+00, float* %x1, align 4 
     44  store float 2.000000e+00, float* %x2, align 4 
     45  store float 3.000000e+00, float* %x3, align 4 
     46  store float 4.000000e+00, float* %x4, align 4 
     47  %2 = load float* %x1, align 4 
     48  %3 = load float* %x2, align 4 
     49  %4 = fadd float %2, %3 
     50  %5 = load float* %x3, align 4 
     51  %6 = fadd float %4, %5 
     52  %7 = load float* %x4, align 4 
     53  %8 = fadd float %6, %7 
     54  store float %8, float* %result, align 4 
     55  %9 = load float* %result, align 4 
     56  %10 = fpext float %9 to double 
     57  %11 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([13 x i8]* @.str, i32 0, i32 0), double %10) 
     58  %12 = load i32* %1 
     59  ret i32 %12 
     60} 
     61 
     62declare i32 @printf(i8*, ...) 
     63}}} 
     64 
     65This is easy enough to run using the JIT compiler:  lli add_floats.ll 
     66 
     67The core of the instructions can be replaced with vectorization (obviously, optimizing this program will result in very little code and vectorization is not necessary, but this is an exercise. 
     68 
     69Here is the .ll code rewritten with vectorization: 
    1070{{{ 
    1171 
     
    1373}}} 
    1474 
    15 Now, we can rearrange the code to make use of the SIMD instructions.  To do this, each of the floats must be packed into the data register, then the instruction is executed, and the results are removed and displayed.  Really, on the core of the above program is altered. 
    16  
    17 {{{ 
    18  
    19 }}} 
    20  
    21