Opened 2 years ago

Last modified 8 hours ago

#10074 new task

Implement the 'Improved LLVM Backend' proposal

Reported by: thoughtpolice Owned by: angerman
Priority: high Milestone: 8.4.1
Component: Compiler (LLVM) Version:
Keywords: llvm, codegen Cc: dterei, scpmw, simonmar, bgamari, angerman, michalt, gueux, George
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: #11295, #12470 Differential Rev(s): Phab:D530
Wiki Page: wiki:ImprovedLLVMBackend

Description

This is a meta ticket designed to reflect the current implementation status of the 'Improved LLVM Backend' proposal, documented here:

https://ghc.haskell.org/trac/ghc/wiki/ImprovedLLVMBackend

Change History (23)

comment:1 Changed 2 years ago by Austin Seipp <austin@…>

In 5d5abdca31cdb4db5303999778fa25c4a1371084/ghc:

llvmGen: move to LLVM 3.6 exclusively

Summary:
Rework llvmGen to use LLVM 3.6 exclusively. The plans for the 7.12 release are to ship LLVM alongside GHC in the interests of user (and developer) sanity.

Along the way, refactor TNTC support to take advantage of the new `prefix` data support in LLVM 3.6. This allows us to drop the section-reordering component of the LLVM mangler.

Test Plan: Validate, look at emitted code

Reviewers: dterei, austin, scpmw

Reviewed By: austin

Subscribers: erikd, awson, spacekitteh, thomie, carter

Differential Revision: https://phabricator.haskell.org/D530

GHC Trac Issues: #10074

comment:2 Changed 2 years ago by Austin Seipp <austin@…>

In 578d2bad19b3e03fac4da1e5be4b22b73cef0a44/ghc:

Remove unneeded compatibility with LLVM < 3.6

Since GHC requires at least LLVM 3.6, some of the special cases (for,
e.g., LLVM 2.8 or 2.9) in the LLVM CodeGen can be simply removed.

Reviewed By: rwbarton, austin

Differential Revision: https://phabricator.haskell.org/D884

GHC Trac Issues: #10074

comment:3 Changed 20 months ago by thoughtpolice

Milestone: 7.12.18.0.1

Milestone renamed

comment:4 Changed 15 months ago by bgamari

Milestone: 8.0.18.2.1

It seems that this likely won't happen for 8.0.

Last edited 15 months ago by bgamari (previous) (diff)

comment:5 Changed 9 months ago by angerman

Cc: angerman added

comment:6 Changed 5 months ago by michalt

Cc: michalt added

comment:7 Changed 5 months ago by bgamari

Milestone: 8.2.18.4.1

This won't be happening for 8.2 either.

comment:8 Changed 5 months ago by bgamari

Wiki Page: wiki:ImprovedLLVMBackend

Currently this plan is in need of an implementor. At this point I'm not convinced that we want or need to ship our own LLVM builds. Rather, I this it would be sufficient to simply try to understand what LLVM passes are fruitful for GHC's code (#11295) and be specific about which LLVM version a particular GHC release targets (which we already do).

There are related opportunities here that are a bit farther off,

comment:9 Changed 3 months ago by dobenour

I think that we should run loop unswitching early in the pipeline, to remove redundant heap/stack checks.

How can we track aliasing information better?

LLVM supports dereferencable annotations. Those might be able to help.

comment:10 in reply to:  9 Changed 7 weeks ago by angerman

Replying to dobenour:

I think that we should run loop unswitching early in the pipeline, to remove redundant heap/stack checks.

How can we track aliasing information better?

LLVM supports dereferencable annotations. Those might be able to help.

The Data.Bitcode stuff I wrote doesn't need the aliasing anymore. However another option would be to teach cmm lable types or not take of from cmm, but stg.

comment:11 Changed 7 weeks ago by angerman

Owner: changed from thoughtpolice to angerman

Ok. Let's do this. I will deviate a bit from the plan in the proposal though. The rough idea is:

  • replace opt+llc with clang. This does imply that we loose the mangler, and probably won't be able to do -split-obj at all.
  • build a release llvm-clang with necessary ghc changes, and call this ghc-clang, until we all ghc relevant patches are upstream in llvm.
  • provide binary distributions for said ghc-clang for at least all tire1 platforms. Other platform will have to build clang from source.

This should hopefully allow us to drop quite a bit of code from the llvm backend. It might re-introduce some new bugs. We do have quite a few hacks here and there to work around bugs in the llvm toolchain, for which we do not necessarily know if they are still present in the llvm toolchain we currently support.

This should allow us to pin the llvm backend to a certain (potentially customized) clang version. This should be an interim solution only though. Hopefully we'll have all the necessary changes in llvm upstreamed by the time llvm5 (~6mo from now) or llvm6 (~12mo from now), will be released.

comment:12 in reply to:  11 ; Changed 7 weeks ago by bgamari

Replying to angerman:

Ok. Let's do this. I will deviate a bit from the plan in the proposal though. The rough idea is:

  • replace opt+llc with clang. This does imply that we loose the mangler, and probably won't be able to do -split-obj at all.

I won't lose much sleep over losing split objects. Frankly, I look forward to the day when we can drop it entirely. However, it seems like the the mangler/AVX situation may be a bit trickier.

  • build a release llvm-clang with necessary ghc changes, and call this ghc-clang, until we all ghc relevant patches are upstream in llvm.
  • provide binary distributions for said ghc-clang for at least all tire1 platforms. Other platform will have to build clang from source.

As we discussed on IRC, I really would like to avoid coming to rely on our own LLVM builds if possible. Let's instead try to just get the patches we need upstream if at all possible. Then we can just piggy-back on the upstream LLVM binary distributions.

This should hopefully allow us to drop quite a bit of code from the llvm backend. It might re-introduce some new bugs. We do have quite a few hacks here and there to work around bugs in the llvm toolchain, for which we do not necessarily know if they are still present in the llvm toolchain we currently support.

Can you list these? I tried to think of what this refers to but I can't think of anything off the top of my head.

This should allow us to pin the llvm backend to a certain (potentially customized) clang version. This should be an interim solution only though. Hopefully we'll have all the necessary changes in llvm upstreamed by the time llvm5 (~6mo from now) or llvm6 (~12mo from now), will be released.

Right. I see no real reason why it should take longer than six months to get our changes upstream.

Thanks for picking this up, angerman!

comment:13 in reply to:  12 ; Changed 7 weeks ago by angerman

Replying to bgamari:

Replying to angerman:

Ok. Let's do this. I will deviate a bit from the plan in the proposal though. The rough idea is:

  • replace opt+llc with clang. This does imply that we loose the mangler, and probably won't be able to do -split-obj at all.

I won't lose much sleep over losing split objects. Frankly, I look forward to the day when we can drop it entirely. However, it seems like the the mangler/AVX situation may be a bit trickier.

As I've just said on irc, I wonder, assuming we did the obj-splitting at the cmm level, wouldn't we get split-obj for free in ncg and llvm? Yet, as [dobenour] mentioned, this would likely prevent inlining in the llvm backend.

  • build a release llvm-clang with necessary ghc changes, and call this ghc-clang, until we all ghc relevant patches are upstream in llvm.
  • provide binary distributions for said ghc-clang for at least all tire1 platforms. Other platform will have to build clang from source.

As we discussed on IRC, I really would like to avoid coming to rely on our own LLVM builds if possible. Let's instead try to just get the patches we need upstream if at all possible. Then we can just piggy-back on the upstream LLVM binary distributions.

Yes this would be ideal. I'm just not convinced (with our track record), that we won't find some llvm fix we need just in time so it doesn't make it into llvm5.

This should hopefully allow us to drop quite a bit of code from the llvm backend. It might re-introduce some new bugs. We do have quite a few hacks here and there to work around bugs in the llvm toolchain, for which we do not necessarily know if they are still present in the llvm toolchain we currently support.

Can you list these? I tried to think of what this refers to but I can't think of anything off the top of my head.

There are some of comments in the opt and llc phases, referring to bugs (e.g. macOS doesn't properly do -O3). Now dropping opt and llc and going just via clang, we do loose some control over the specific optimization flags we can pass, but in return get a stable unified interface.

This should allow us to pin the llvm backend to a certain (potentially customized) clang version. This should be an interim solution only though. Hopefully we'll have all the necessary changes in llvm upstreamed by the time llvm5 (~6mo from now) or llvm6 (~12mo from now), will be released.

Right. I see no real reason why it should take longer than six months to get our changes upstream.

On a final note: actually building a custom (static) clang to distribute seems rather simple. I've a makefile or ~10 lines that I believe would also work on linux and bsds; windows would need to be figured out.

comment:14 Changed 7 weeks ago by angerman

Regarding -split-obj, #11445 makes me believe we can drop that altogether.

comment:15 Changed 6 weeks ago by gueux

Cc: gueux added

comment:16 Changed 6 weeks ago by awson

Clang driver is not particularly good on Windows.

I believe using clang will buy us very little (if anything), and would, perhaps, make things even worse.

comment:17 in reply to:  16 Changed 6 weeks ago by angerman

Replying to awson:

Clang driver is not particularly good on Windows.

I believe using clang will buy us very little (if anything), and would, perhaps, make things even worse.

What you are saying is that clang is worse than opt and llc on windows? I should really get myself some windows box somewhere :(

comment:18 in reply to:  13 ; Changed 6 weeks ago by awson

Well, perhaps I was not quite correct.

I mostly had in mind things like (for example) clang on Windows doesn't supporting -flto, but using separate utilities, e.g doing llvm-link between opt and llc we can accomplish the thing.

OTOH, we still can call clang twice, first instead of opt then instead of llc with llvm-link in-between.

Still I'm very much not sure we need to bother with the beast like clang to only be able to get rid of literally a couple of lines of haskell code. Moreover, I'm not sure OS X -O3 example is quite relevant here. Do you mean using -O3 with opt and/or llc driver yields different results from if we use the same -O3 with clang driver on the same version of llvm/clang?

comment:19 in reply to:  18 Changed 6 weeks ago by angerman

Replying to awson:

Well, perhaps I was not quite correct.

I mostly had in mind things like (for example) clang on Windows doesn't supporting -flto, but using separate utilities, e.g doing llvm-link between opt and llc we can accomplish the thing.

OTOH, we still can call clang twice, first instead of opt then instead of llc with llvm-link in-between.

Still I'm very much not sure we need to bother with the beast like clang to only be able to get rid of literally a couple of lines of haskell code. Moreover, I'm not sure OS X -O3 example is quite relevant here. Do you mean using -O3 with opt and/or llc driver yields different results from if we use the same -O3 with clang driver on the same version of llvm/clang?

The actual diff is here: https://phabricator.haskell.org/D3352, which you might or might now have seen.

Maybe the -flto on windows has changed with llvm4 already? We could I guess, do two clang runs, my intention though is to replace

ghc -> llvm ir -> opt -> llc -> mangler -> as -> object

to

ghc -> llvm ir -> clang -> object.

The mentioned macOS -O3 bug, referred to the following lines, which sadly do not say which llvm version exhibited the issue.

-- Bug in LLVM at O3 on OSX.
llvmOpts = if platformOS (targetPlatform dflags) == OSDarwin
           then ["-O1", "-O2", "-O2"]
           else ["-O1", "-O2", "-O3"]

I'm proposing to take this opportunity and start from a blank slate and drop any maybe it's still broken, maybe not parts from the pipeline.

comment:20 Changed 6 weeks ago by awson

Ah, it looks so much happened under the hood which I wasn't aware of!

A couple of comments and answers then:

  1. -flto doesn't work on windows even on the current llvm5 and won't in the foreseeable future, because it requires GOLD linker plugin to work on unices, and we have neither on windows.
  2. AFAIUI, Matthias Braun's early advice to use clang driver was mostly inspired by his ignorance of how different STG execution model is from that of C, later he understood this and stated that since we need -fllvm anyway, i.e. need to bypass clang's "high-level" predefined -OX sets of options then either using clang or opt/llc drivers is "equally good/bad" in our use case.
  3. Btw, why can't we simply do ghc -> llvm ir -> clang -> mangler -> as -> object if we still need the mangler? Or we can but *don't want*?
  4. Even if not using clang, a part of your patches in https://phabricator.haskell.org/D3352 still looks relevant, e.g. we can drop pprLlvmHeader/moduleLayout thingy since it is inferred by LLVM tools from module target triple anyway.

comment:21 Changed 6 weeks ago by angerman

  • if lto depends on gold, than this will clearly only work on ELF based systems, I'm not just if lld would solve this though, it's supposed to be somewhat stable already.
  • I don't see the STG/C difference in the emails. I might be not reading something right though. Yes if we want exact control over opt and llc, which we can't get through clang, we will need to revert back to those tools. I however would prefer not to. That clang or opt/llc are equally suboptimal is certainly correct. I would argue that one tool is preferable over two tools, unless we find actual usecases we can't achieve with that single tool (and can not subsequently convince the llvm people that our usecase is legit.).
  • Yes, we could ask clang to output assembly, (or bitcode if we wanted to use llvms bitcode linkter), and use clang as the assembler as well. I simply want to get rid of the mangler if possible (see also #11138); right now we use the mangler for three things: a) avx mangling, which I'm not certain we still need, and if we need it we should figure out why and fix it in llvm upstream. b) function/object rewrites, which I'm suspicious of as well (see https://reviews.llvm.org/D30812) and c) the -dead_strip fix, which we do not need with llvm5 or a patched llvm4 anymore and only affects mach-o based systems (iOS, macOS, ...) anyway.
  • Dropping the dreaded module layout / header logic was a long time goal of mine, as it is not only painful to keep those up to date, and I'm not even sure we have the proper values.

As the Imrpoved LLVM Proposal was about bundling llvm with ghc, to have better control over the llvm backend, bundling clang (or if we really must opt+llc) looks to me like the way to go. Ideally though, I'd prefer to find all necessary fixes we need in llvm and have them upstreamed in llvm5, such that ghc8.4 can simply require llvm5. However I'm not opposed to laying the foundation to bundle clang with ghc, should the need arise.

comment:22 Changed 8 hours ago by George

One interesting option clang would give us is to specify -Os, which according to the clang man page is like the clang -O2 option but with extra optimizations to reduce code size. opt and llc don't seem to have that option. Of course this wouldn't help the ghc code gen but might help with clang as the -O3 option along with -Os would allow users to experiment with size/speeed tradeoffs.

I have seen discussions about not doing certain optimizations as the code size would be increased and we were unsure if the result would be faster or not.

OTOH I don't know very much and maybe all the important code size / speed tradeoffs are made before we get to llvm.

comment:23 Changed 8 hours ago by George

Cc: George added
Note: See TracTickets for help on using tickets.