wiki:Commentary/Compiler/Backends/LLVM/Alias

Version 17 (modified by dterei, 3 years ago) (diff)

--

Improving LLVM Alias Analysis

This page tracks the information and progress relevant to improving the alias analysis pass for the LLVM backend of GHC.

This correspond to bug #5567.

LLVM Alias Analysis Infrastructure

Some links to the various documentation on LLVM's AA support:

Max's Work

Max had a crack at writing a custom alias analysis pass for LLVM, relevant links are:

TBAA

LLVM as of version 2.9 includes Type Based Alias Analysis. This mean using metadata you can specify a type hierarchy (with alias properties between types) and annotate your code with these types to improve the alias information. This should allow us to improve the alias analysis without any changes to LLVM itself like Max made.

STG / Cmm Alias Properties

Question (David Terei): What alias properties does the codegen obey? Sp and Hp never alias? R<n> registers never alias? ....

Answer (Simon Marlow): Sp[] and Hp[] never alias, R[] never aliases with Sp[], and that's about it.

I64[ Sp + n ] = "stack"

I64[ Base + n ] = "base"

I64[ Hp + n ] = "heap"
I64[ R1 + n ] = "heap"

I64[ I64[Sp + n]  ] = "heap"
I64[ I64[Sp + n] + m  ] = "heap"

I64[ I64[R1 + n] ] = "heap"
I64[ I64[R1 + n] + m ] = "heap"
I64[ I64[Sp + n] +  I64 [R1 + n] ] = "heap"

Simon: As long as it propagates properly, such that every F(Sp) is a stack pointer, where F() is any expression context except a dereference. That is, we better be sure that

I64[Sp + R1[n]]

is "stack", not "heap".

How to Track TBAA information

Really to be sound and support Cmm in full we would need to track and propagate TBAA information. It's Types after all! At the moment we don't. We simply rely on the fact that the Cmm code generated for loads and stores is nearly always in the form of:

I64[ Sp ... ] = ...

That is to say, it has the values it depends on for the pointer derivation in-lined in the load or store expression. It is very rarely of the form:

x = Sp + 8
I64[x] = ...

And when it is, 'it is' (unconfirmed) always deriving a "heap" pointer, "stack" pointers are always of the in-line variety. This assumption if true allows us to look at just a store or load in isolation to properly Type it.

LLVM type system

The above aliasing information can be encoded as follows:

!0 = metadata !{ metadata !"top" }
!1 = metadata !{ metadata !"heap", metadata !0 }
!2 = metadata !{ metadata !"stack", metadata !0 }
!3 = metadata !{ metadata !"rx", metadata !1 }
!4 = metadata !{ metadata !"base", metadata !0 }
!5 = metadata !{ metadata !"other", metadata !0 }

The fact that R[] never aliases with Sp[] is never used as the one way relation isn't expressible in LLVM.

Stores/loads needs to be annotated with !tbaa and one of the above four types e.g.

%ln1NH1 = load i64* %Sp_Arg, align 8, !tbaa !2

Progress

Some patches have been pushed that implement TBAA and significantly improve the Alias Analysis story. See ticket #5567 for more information.