#8033 closed task (invalid)

add AVX register support to llvm calling convention

Reported by: carter Owned by: carter
Priority: normal Milestone:
Component: Compiler Version: 7.7
Keywords: Cc: bgamari@…, anton.nik@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

GHC HEAD currently has support for SSE 128bit registers (XMM), and it would be very little additional work to add 256bit AVX2 operations (when available) PROVIDED that the llvm calling convention for ghc is modified to support them when available

the current definition in LLVM can be seen here https://github.com/llvm-mirror/llvm/blob/master/lib/Target/X86/X86CallingConv.td#L279-L291

the current definition is

def CC_X86_64_GHC : CallingConv<[
  // Promote i8/i16/i32 arguments to i64.
  CCIfType<[i8, i16, i32], CCPromoteToType<i64>>,

  // Pass in STG registers: Base, Sp, Hp, R1, R2, R3, R4, R5, R6, SpLim
  CCIfType<[i64],
            CCAssignToReg<[R13, RBP, R12, RBX, R14, RSI, RDI, R8, R9, R15]>>,

  // Pass in STG registers: F1, F2, F3, F4, D1, D2
  CCIfType<[f32, f64, v16i8, v8i16, v4i32, v2i64, v4f32, v2f64],
            CCIfSubtarget<"hasSSE1()",
            CCAssignToReg<[XMM1, XMM2, XMM3, XMM4, XMM5, XMM6]>>>,


]>;


I believe the update should read

def CC_X86_64_GHC : CallingConv<[
  // Promote i8/i16/i32 arguments to i64.
  CCIfType<[i8, i16, i32], CCPromoteToType<i64>>,

  // Pass in STG registers: Base, Sp, Hp, R1, R2, R3, R4, R5, R6, SpLim
  CCIfType<[i64],
            CCAssignToReg<[R13, RBP, R12, RBX, R14, RSI, RDI, R8, R9, R15]>>,

  // Pass in STG registers for  floats,doubles and 128bit simd vectors
  CCIfType<[f32, f64, v16i8, v8i16, v4i32, v2i64, v4f32, v2f64],
            CCIfSubtarget<"hasSSE1()",
            CCAssignToReg<[XMM1, XMM2, XMM3, XMM4, XMM5, XMM6]>>>,

// Pass in STG registers for first 7  256bit simd vectors
CCIfType<[v32i8, v16i16, v8i32, v4i64, v8f32, v4f64],
                          CCIfSubtarget<"hasAVX()",
                          CCAssignToReg<[YMM0, YMM1, YMM2, YMM3,
                                         YMM4, YMM5, YMM6]>>>

]>;


Note that this is FULLY backwards compatible with current GHC, it merely means that we can have 256bit simd values if we so choose.

if I get the go ahead from y'all, i'm happy to see about getting that patch into llvm

Attachments (4)

ghc_x86_64_patch.diff (1.0 KB) - added by carter 12 months ago.
diff / patch for changing calling convention
ghc-x86-64+avx.diff (4.3 KB) - added by carter 12 months ago.
ghc-x86-64-avx-Check.diff (4.2 KB) - added by carter 12 months ago.
ghc-x86-64-avx-volatile (1).patch (4.4 KB) - added by carter 12 months ago.

Download all attachments as: .zip

Change History (33)

comment:1 Changed 12 months ago by gmainland

David Terei is the best GHC contact point with the LLVM team. I would recommend contacting him---he will know best how to get the patches we need into LLVM.

comment:2 Changed 12 months ago by carter

Ok, I"ll take that as a vote of YES by ghc HQ for this change. I'll contact him + 1-2 other folks

i'm attaching a diff / patch file that does this change.

also changed my language in the comments to the following

def CC_X86_64_GHC : CallingConv<[
  // Promote i8/i16/i32 arguments to i64.
  CCIfType<[i8, i16, i32], CCPromoteToType<i64>>,

  // Pass in STG registers: Base, Sp, Hp, R1, R2, R3, R4, R5, R6, SpLim
  CCIfType<[i64],
            CCAssignToReg<[R13, RBP, R12, RBX, R14, RSI, RDI, R8, R9, R15]>>,

  // Pass in STG registers for  floats, doubles and 128bit simd vectors
  CCIfType<[f32, f64, v16i8, v8i16, v4i32, v2i64, v4f32, v2f64],
            CCIfSubtarget<"hasSSE1()",
            CCAssignToReg<[XMM1, XMM2, XMM3, XMM4, XMM5, XMM6]>>>,

  // Pass in STG registers for 256bit simd vectors
  CCIfType<[v32i8, v16i16, v8i32, v4i64, v8f32, v4f64],
                          CCIfSubtarget<"hasAVX()",
                          CCAssignToReg<[YMM0, YMM1, YMM2, YMM3,
                                         YMM4, YMM5, YMM6]>>>

]>;

Changed 12 months ago by carter

diff / patch for changing calling convention

comment:3 Changed 12 months ago by gmainland

Great, thanks! You may also want to change the comment in the patch to note that it is for 256-bit vectors as well as 128-bit vectors :)

comment:4 Changed 12 months ago by dterei

Sounds fine. I'd simply email the LLVM mailing list or open a ticket to get the patch merged. Feel free to name drop me if it helps but I don't expect any issues. If that doesn't work I can email one or two people directly which in the past has been faster.

comment:5 Changed 12 months ago by carter

  • Owner set to carter
  • Type changed from feature request to task

I've a friend who's active on llvm and giving me some pointers, including it sounds like providing 1-2 test examples to validate the support / make it easy to test on their side.

I'll fiddle around with toy test suite request and pester you both if i need any more help on this.

I'll change this to a task and setmyself as owner for now.

comment:6 Changed 12 months ago by carter

@gmainland I don't understand your remark about the registers comment... XMM_n is the lower half of the corresponding YMM_n, the additional rule only fires for "256 bit" vectors, otherwise the lower half is used instead. Could you show me which comment you want to change and how?

comment:7 Changed 12 months ago by gmainland

It is my error. I misunderstood the scope of your comment.

comment:8 Changed 12 months ago by carter

relatedly, i'm looking at the llvm calling convention spec, and I'm noticing / just now realizing that 32bit x86 mode doesn't preclude using simd!

https://github.com/llvm-mirror/llvm/blob/master/lib/Target/X86/X86CallingConv.td#L428
(the ghc calling convention and the associated others)

So we perhaps could replicate the simd stanzas and add them to 32bit calling convention on the llvm side?

I believe that would have zero impact on any ghc that uses llvm 32bit with the current calling convention because ghc manages its own stack spilling,
and would harmonize simd to be llvm + x86 backend specific but not 64bit only (merely leaves the question of the segmenting of various levels of SIMD support that the target has, which is needed for using avx / avx2 correctly anyways, so *shouldnt* change any complexity on our side ).

Im not going to add that for now, but just throwing the idea out there, because as is, for supporting interesting SIMD, we need to have a finer grained notion of target going forward *anyways* beyond "is it x86_64" or not *Anyways*, and in some sense, those other notions are *orthogonal* to 32 bit vs 64bit

If you two favor such a change, i'll add that to the proposed patch. (we'll stil have to hash out better sub target info in ghc land, but thats separate from making sure the backend has the machinery to support it already or not )

comment:9 Changed 12 months ago by bgamari

  • Cc bgamari@… added

Changed 12 months ago by carter

comment:10 Changed 12 months ago by carter

  • Difficulty set to Unknown

just uploaded the updated patch + the new llvm test suite element, will email llvm list about the 64bit calling convention now

comment:11 Changed 12 months ago by carter

updating patch based upon feedback

Changed 12 months ago by carter

comment:12 Changed 12 months ago by carter

heres an updated patch per the on llvm list suggestions

Changed 12 months ago by carter

comment:13 Changed 12 months ago by carter

  // Pass in STG registers for  128bit simd vectors
  CCIfType<[ v16i8, v8i16, v4i32, v2i64, v4f32, v2f64],
            CCIfSubtarget<"hasSSE1()",
            CCAssignToReg<[XMM1, XMM2, XMM3, XMM4, XMM5, XMM6]>>>,

// Pass in STG registers for first 6  256bit simd vectors
CCIfType<[v32i8, v16i16, v8i32, v4i64, v8f32, v4f64],
                          CCIfSubtarget<"hasAVX()",
                          CCAssignToReg<[ YMM1, YMM2, YMM3,
                                         YMM4, YMM5, YMM6]>>>

I propose adding this to the x86_32 convention as per @gmainland's suggestion on list. Please indicate on this ticket if that modification is ok :)

comment:14 Changed 12 months ago by carter

if other ghc devs are in favor of the proposed change to x86_32, I'll amend the patch i've proposed to the llvm devs accordingly, plus the needed test suite examples, but I need to know in the next day or so

comment:15 Changed 12 months ago by dterei

Sounds fine.

comment:16 Changed 12 months ago by gmainland

I think the optimal solution would be to have Float and Double arguments passed in registers on X86_32. Per my comment on ghc-devs, we want code compiled with the LLVM back-end to be able to inter-operate with code compiled by the native back-end. Since the native codegen passes all Float and Double parameters on the stack, we would need to change the behavior of the native codegen too---it's not enough to change the LLVM calling convention.

I don't think this would be all that much work. If we really care about 32-bit performance, it's the right thing to do.

comment:17 Changed 12 months ago by carter

Ok, I'll amend the patch so that the XMM and YMM stanzas are the same for 32bit and 64bit x86.

Current 32bit x86 GHC's will still work correctly because they spill all the floating point values to the stack, and I guess we'll figure out manpower wise doing the work to synch up the native code gen to support floating point args too.

comment:18 Changed 12 months ago by carter

  // Pass in STG registers for  floats, doubles and 128bit simd vectors
  CCIfType<[f32, f64, v16i8, v8i16, v4i32, v2i64, v4f32, v2f64],
            CCIfSubtarget<"hasSSE1()",
            CCAssignToReg<[XMM1, XMM2, XMM3, XMM4, XMM5, XMM6]>>>,

  // Pass in STG registers for 256bit simd vectors
  CCIfType<[v32i8, v16i16, v8i32, v4i64, v8f32, v4f64],
                          CCIfSubtarget<"hasAVX()",
                          CCAssignToReg<[YMM0, YMM1, YMM2, YMM3,
                                         YMM4, YMM5, YMM6]>>>

will be added to both stanzas then

Version 0, edited 12 months ago by carter (next)

comment:19 Changed 12 months ago by gmainland

Shouldn't that be hasSSE2, not hasSSE1?

comment:20 Changed 12 months ago by carter

So does this mean that the version LLVM has is currently buggy?
https://github.com/llvm-mirror/llvm/blob/master/lib/Target/X86/X86CallingConv.td#L289
(Note that a number of other conventions seem to use )

it uses the "hasSSE1()" check currently.

That said, most of the useful SSE features / simd capabilities don't happen till sse2... so not terribly useful for us

strictly speaking, it is correct, in SSE1 there are XMM0-7, though the only simd operations available then are for Floats.
SSE2 gets the Word + Double simd primops.

Likewise XMM8-15 start being available in sse3

sse2 started being in intel and amd chips 2001-2003, and sse3 landed in 2004.

comment:21 Changed 12 months ago by gmainland

SSE (the original) only allowed short vectors of 4 single-precision floating point values. So yes, I think the current version is buggy, and strictly speaking incorrect! With SSE1 (only), you cannot put Double arguments into XMM*.

comment:22 Changed 12 months ago by carter

  1. changing to "hasSSE2()" will make the engineering for the various levels of SIMD and impact on calling convention a teenyt bit simpler
    1. If we keep "hasSSE1", we'll want to spill Double's to the stack at -msse1, and pass the first few in registers on -msse2 or higher, which is easy for us to support because we do our stack spills and loads *BEFORE* llvm. This is because SSE1 only supports operations on Floats and not Doubles. (@gmainland, I see you replied already)
    2. yes, as GHC's SIMD is currently implemented, this is a bug in LLVM (or an oversight on our side?!). @Gmainland, could you create a ticket with a toy example using GHC head that illustrates it as a bug? (LLVM patches that are bug fixes for down stream tools are things we can get considered for the 3.3 POINT release). having a ticket and an explicit example that hits it would be handy!
  1. The same sort of instruction selection/register problem sort of happens once we consider supporting both AVX1 and AVX2!
    1. AVX1 only supports 256bit Float / Double short vector operations, its Word / Int operations are 128bit only.
    2. AVX2 is required for Word / Int operations to be 256bit (except when you can encode them as the corresponding Floating point SIMD operations). So we really need to sort out a good story here for that anyways (which is kinda the same or similar to the sse1 vs sse2 problem, Ie many machines have AVX1, all sandy-bridge and ivy-bridge Intel chips, and only some cpus have AVX2, the recent Haswell generation)
  1. (this perhaps should be done as a different patch for llvm) Would it be worth considering augmenting the number of XMM / YMM used to be XMM0-7/YMM0-7 ? (from current XMM1-6 ). This would be the same scope of change as the x86_32 specific change, but would mean that roughy LLVM 3.4 onwards would only work with newer ghcs. (this is true anyways actually right? 7.6 and earlier are a bit sloppier in their generated bit code, so they dont play nice with newest llvm's, right? ).
    1. This wouldn't change any Caller side code except for stack spilling, and any calle side code aside from a few reads from the stack, right?

should I re-email the list to make sure other folks are in the loop? It sounds like we're already considering changing the x86_32 calling convention (albeit in a ghc old and new compatible way), and thats as good a time as any to start seriously considering any other calling convention changes, even if we test them out using a patched GHC and LLVM first. (i don't have the right hardware to run NOFIB for testing such a change myself...)

comment:23 Changed 12 months ago by gmainland

I have way to much on my plate at the moment to do anything other than answer questions, although I am happy to do that.

The hasSSE1 test is vacuously true on x86_64. So it doesn't affect code correctness, but it is confusing.

I was wrong about needing the test at all on x32. We could make GHC pass Floats and Double on the stack when SSE is off, pass Floats in registers when -msse is set, and pass both Floats and Doubles (an vectors) in registers when -msse2 is set. The test wouldn't be necessary, because if -msse2 isn't set, GHC will simply pass Doubles on the stack and LLVM will never see a function call with a double-precision argument.

I'm not concerned with 32-bit performance at this point. Do you have an application that requires these changes for performance reasons? If you do, I still don't think it's worth making changes to the 32-bit LLVM calling conventions until we can utilize them in GHC. Simply changing the GHC calling convention in LLVM isn't enough, even when using the LLVM back-end.

comment:24 Changed 12 months ago by carter

@gmainland, is the sse2 test always true on x86_64? Thats my concern, that -msse on x86_64 will be "wrong" with GHC head now that we have SIMD support. Unless LLVM does the right instruction lowering / register transfers when trying to use sse2 when set to -msse

comment:25 Changed 12 months ago by gmainland

SSE2 instructions are always available on the 64-bit Intel architecture.

comment:26 Changed 12 months ago by carter

Ok, so we should change the hasSSE1 to hasSSE2 to simplify adding short vector / simd support uniformly when its available (ie, for now NOT have 32bit ghc put floats and doubles in registers, thats something we can consider at some future point perhaps).

So for now (a)

// Pass in STG registers for  floats, doubles and 128bit simd vectors
  CCIfType<[f32, f64, v16i8, v8i16, v4i32, v2i64, v4f32, v2f64],
            CCIfSubtarget<"hasSSE2()",
            CCAssignToReg<[XMM1, XMM2, XMM3, XMM4, XMM5, XMM6]>>>,

  // Pass in STG registers for 256bit simd vectors
  CCIfType<[v32i8, v16i16, v8i32, v4i64, v8f32, v4f64],
                          CCIfSubtarget<"hasAVX()",
                          CCAssignToReg<[ YMM1, YMM2, YMM3,
                                         YMM4, YMM5, YMM6]>>>

for x86_64

and

// Pass in STG registers for  floats, doubles and 128bit simd vectors
  CCIfType<[ v16i8, v8i16, v4i32, v2i64, v4f32, v2f64],
            CCIfSubtarget<"hasSSE2()",
            CCAssignToReg<[XMM1, XMM2, XMM3, XMM4, XMM5, XMM6]>>>,

  // Pass in STG registers for 256bit simd vectors
  CCIfType<[v32i8, v16i16, v8i32, v4i64, v8f32, v4f64],
                          CCIfSubtarget<"hasAVX()",
                          CCAssignToReg<[ YMM1, YMM2, YMM3,
                                         YMM4, YMM5, YMM6]>>> 

for x86_32? (though we could use the same stanza )

or

(b)

// Pass in STG registers for  floats, doubles and 128bit simd vectors
  CCIfType<[f32, f64, v16i8, v8i16, v4i32, v2i64, v4f32, v2f64],
            CCIfSubtarget<"hasSSE2()",
            CCAssignToReg<[XMM1, XMM2, XMM3, XMM4, XMM5, XMM6]>>>,

  // Pass in STG registers for 256bit simd vectors
  CCIfType<[v32i8, v16i16, v8i32, v4i64, v8f32, v4f64],
                          CCIfSubtarget<"hasAVX()",
                          CCAssignToReg<[ YMM1, YMM2, YMM3,
                                         YMM4, YMM5, YMM6]>>>

for BOTH? (but not use the registers for floats and doubles in x86_32 for now)

either change will be compatible with past and current GHCs

comment:27 Changed 12 months ago by gmainland

I support a patch for x86_64 that passes 256-bit vectors in YMM registers when AVX is available. I am ambivalent about changing the x86_32 calling conventions, but it seems to me it would be a mistake to continue passing Float and Double arguments on the stack when SSE2 instructions are available.

comment:28 Changed 12 months ago by lelf

  • Cc anton.nik@… added

comment:29 Changed 10 months ago by carter

  • Resolution set to invalid
  • Status changed from new to closed

closing for now because this will be subsumed a somewhat more ambitious exploration

Note: See TracTickets for help on using tickets.