Opened 3 years ago

Last modified 26 hours ago

#10074 new task

Implement the 'Improved LLVM Backend' proposal

Reported by: thoughtpolice Owned by: angerman
Priority: high Milestone: 8.4.1
Component: Compiler (LLVM) Version:
Keywords: llvm, codegen Cc: dterei, scpmw, simonmar, bgamari, angerman, michalt, gueux, George, kavon, alpmestan
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: #11295, #12470 Differential Rev(s): Phab:D530
Wiki Page: wiki:ImprovedLLVMBackend

Description

This is a meta ticket designed to reflect the current implementation status of the 'Improved LLVM Backend' proposal, documented here:

https://ghc.haskell.org/trac/ghc/wiki/ImprovedLLVMBackend

Change History (34)

comment:1 Changed 3 years ago by Austin Seipp <austin@…>

In 5d5abdca31cdb4db5303999778fa25c4a1371084/ghc:

llvmGen: move to LLVM 3.6 exclusively

Summary:
Rework llvmGen to use LLVM 3.6 exclusively. The plans for the 7.12 release are to ship LLVM alongside GHC in the interests of user (and developer) sanity.

Along the way, refactor TNTC support to take advantage of the new `prefix` data support in LLVM 3.6. This allows us to drop the section-reordering component of the LLVM mangler.

Test Plan: Validate, look at emitted code

Reviewers: dterei, austin, scpmw

Reviewed By: austin

Subscribers: erikd, awson, spacekitteh, thomie, carter

Differential Revision: https://phabricator.haskell.org/D530

GHC Trac Issues: #10074

comment:2 Changed 3 years ago by Austin Seipp <austin@…>

In 578d2bad19b3e03fac4da1e5be4b22b73cef0a44/ghc:

Remove unneeded compatibility with LLVM < 3.6

Since GHC requires at least LLVM 3.6, some of the special cases (for,
e.g., LLVM 2.8 or 2.9) in the LLVM CodeGen can be simply removed.

Reviewed By: rwbarton, austin

Differential Revision: https://phabricator.haskell.org/D884

GHC Trac Issues: #10074

comment:3 Changed 2 years ago by thoughtpolice

Milestone: 7.12.18.0.1

Milestone renamed

comment:4 Changed 23 months ago by bgamari

Milestone: 8.0.18.2.1

It seems that this likely won't happen for 8.0.

Last edited 23 months ago by bgamari (previous) (diff)

comment:5 Changed 16 months ago by angerman

Cc: angerman added

comment:6 Changed 13 months ago by michalt

Cc: michalt added

comment:7 Changed 12 months ago by bgamari

Milestone: 8.2.18.4.1

This won't be happening for 8.2 either.

comment:8 Changed 12 months ago by bgamari

Wiki Page: wiki:ImprovedLLVMBackend

Currently this plan is in need of an implementor. At this point I'm not convinced that we want or need to ship our own LLVM builds. Rather, I this it would be sufficient to simply try to understand what LLVM passes are fruitful for GHC's code (#11295) and be specific about which LLVM version a particular GHC release targets (which we already do).

There are related opportunities here that are a bit farther off,

comment:9 Changed 11 months ago by dobenour

I think that we should run loop unswitching early in the pipeline, to remove redundant heap/stack checks.

How can we track aliasing information better?

LLVM supports dereferencable annotations. Those might be able to help.

comment:10 in reply to:  9 Changed 9 months ago by angerman

Replying to dobenour:

I think that we should run loop unswitching early in the pipeline, to remove redundant heap/stack checks.

How can we track aliasing information better?

LLVM supports dereferencable annotations. Those might be able to help.

The Data.Bitcode stuff I wrote doesn't need the aliasing anymore. However another option would be to teach cmm lable types or not take of from cmm, but stg.

comment:11 Changed 9 months ago by angerman

Owner: changed from thoughtpolice to angerman

Ok. Let's do this. I will deviate a bit from the plan in the proposal though. The rough idea is:

  • replace opt+llc with clang. This does imply that we loose the mangler, and probably won't be able to do -split-obj at all.
  • build a release llvm-clang with necessary ghc changes, and call this ghc-clang, until we all ghc relevant patches are upstream in llvm.
  • provide binary distributions for said ghc-clang for at least all tire1 platforms. Other platform will have to build clang from source.

This should hopefully allow us to drop quite a bit of code from the llvm backend. It might re-introduce some new bugs. We do have quite a few hacks here and there to work around bugs in the llvm toolchain, for which we do not necessarily know if they are still present in the llvm toolchain we currently support.

This should allow us to pin the llvm backend to a certain (potentially customized) clang version. This should be an interim solution only though. Hopefully we'll have all the necessary changes in llvm upstreamed by the time llvm5 (~6mo from now) or llvm6 (~12mo from now), will be released.

comment:12 in reply to:  11 ; Changed 9 months ago by bgamari

Replying to angerman:

Ok. Let's do this. I will deviate a bit from the plan in the proposal though. The rough idea is:

  • replace opt+llc with clang. This does imply that we loose the mangler, and probably won't be able to do -split-obj at all.

I won't lose much sleep over losing split objects. Frankly, I look forward to the day when we can drop it entirely. However, it seems like the the mangler/AVX situation may be a bit trickier.

  • build a release llvm-clang with necessary ghc changes, and call this ghc-clang, until we all ghc relevant patches are upstream in llvm.
  • provide binary distributions for said ghc-clang for at least all tire1 platforms. Other platform will have to build clang from source.

As we discussed on IRC, I really would like to avoid coming to rely on our own LLVM builds if possible. Let's instead try to just get the patches we need upstream if at all possible. Then we can just piggy-back on the upstream LLVM binary distributions.

This should hopefully allow us to drop quite a bit of code from the llvm backend. It might re-introduce some new bugs. We do have quite a few hacks here and there to work around bugs in the llvm toolchain, for which we do not necessarily know if they are still present in the llvm toolchain we currently support.

Can you list these? I tried to think of what this refers to but I can't think of anything off the top of my head.

This should allow us to pin the llvm backend to a certain (potentially customized) clang version. This should be an interim solution only though. Hopefully we'll have all the necessary changes in llvm upstreamed by the time llvm5 (~6mo from now) or llvm6 (~12mo from now), will be released.

Right. I see no real reason why it should take longer than six months to get our changes upstream.

Thanks for picking this up, angerman!

comment:13 in reply to:  12 ; Changed 9 months ago by angerman

Replying to bgamari:

Replying to angerman:

Ok. Let's do this. I will deviate a bit from the plan in the proposal though. The rough idea is:

  • replace opt+llc with clang. This does imply that we loose the mangler, and probably won't be able to do -split-obj at all.

I won't lose much sleep over losing split objects. Frankly, I look forward to the day when we can drop it entirely. However, it seems like the the mangler/AVX situation may be a bit trickier.

As I've just said on irc, I wonder, assuming we did the obj-splitting at the cmm level, wouldn't we get split-obj for free in ncg and llvm? Yet, as [dobenour] mentioned, this would likely prevent inlining in the llvm backend.

  • build a release llvm-clang with necessary ghc changes, and call this ghc-clang, until we all ghc relevant patches are upstream in llvm.
  • provide binary distributions for said ghc-clang for at least all tire1 platforms. Other platform will have to build clang from source.

As we discussed on IRC, I really would like to avoid coming to rely on our own LLVM builds if possible. Let's instead try to just get the patches we need upstream if at all possible. Then we can just piggy-back on the upstream LLVM binary distributions.

Yes this would be ideal. I'm just not convinced (with our track record), that we won't find some llvm fix we need just in time so it doesn't make it into llvm5.

This should hopefully allow us to drop quite a bit of code from the llvm backend. It might re-introduce some new bugs. We do have quite a few hacks here and there to work around bugs in the llvm toolchain, for which we do not necessarily know if they are still present in the llvm toolchain we currently support.

Can you list these? I tried to think of what this refers to but I can't think of anything off the top of my head.

There are some of comments in the opt and llc phases, referring to bugs (e.g. macOS doesn't properly do -O3). Now dropping opt and llc and going just via clang, we do loose some control over the specific optimization flags we can pass, but in return get a stable unified interface.

This should allow us to pin the llvm backend to a certain (potentially customized) clang version. This should be an interim solution only though. Hopefully we'll have all the necessary changes in llvm upstreamed by the time llvm5 (~6mo from now) or llvm6 (~12mo from now), will be released.

Right. I see no real reason why it should take longer than six months to get our changes upstream.

On a final note: actually building a custom (static) clang to distribute seems rather simple. I've a makefile or ~10 lines that I believe would also work on linux and bsds; windows would need to be figured out.

comment:14 Changed 9 months ago by angerman

Regarding -split-obj, #11445 makes me believe we can drop that altogether.

comment:15 Changed 9 months ago by gueux

Cc: gueux added

comment:16 Changed 9 months ago by awson

Clang driver is not particularly good on Windows.

I believe using clang will buy us very little (if anything), and would, perhaps, make things even worse.

comment:17 in reply to:  16 Changed 9 months ago by angerman

Replying to awson:

Clang driver is not particularly good on Windows.

I believe using clang will buy us very little (if anything), and would, perhaps, make things even worse.

What you are saying is that clang is worse than opt and llc on windows? I should really get myself some windows box somewhere :(

comment:18 in reply to:  13 ; Changed 9 months ago by awson

Well, perhaps I was not quite correct.

I mostly had in mind things like (for example) clang on Windows doesn't supporting -flto, but using separate utilities, e.g doing llvm-link between opt and llc we can accomplish the thing.

OTOH, we still can call clang twice, first instead of opt then instead of llc with llvm-link in-between.

Still I'm very much not sure we need to bother with the beast like clang to only be able to get rid of literally a couple of lines of haskell code. Moreover, I'm not sure OS X -O3 example is quite relevant here. Do you mean using -O3 with opt and/or llc driver yields different results from if we use the same -O3 with clang driver on the same version of llvm/clang?

comment:19 in reply to:  18 Changed 9 months ago by angerman

Replying to awson:

Well, perhaps I was not quite correct.

I mostly had in mind things like (for example) clang on Windows doesn't supporting -flto, but using separate utilities, e.g doing llvm-link between opt and llc we can accomplish the thing.

OTOH, we still can call clang twice, first instead of opt then instead of llc with llvm-link in-between.

Still I'm very much not sure we need to bother with the beast like clang to only be able to get rid of literally a couple of lines of haskell code. Moreover, I'm not sure OS X -O3 example is quite relevant here. Do you mean using -O3 with opt and/or llc driver yields different results from if we use the same -O3 with clang driver on the same version of llvm/clang?

The actual diff is here: https://phabricator.haskell.org/D3352, which you might or might now have seen.

Maybe the -flto on windows has changed with llvm4 already? We could I guess, do two clang runs, my intention though is to replace

ghc -> llvm ir -> opt -> llc -> mangler -> as -> object

to

ghc -> llvm ir -> clang -> object.

The mentioned macOS -O3 bug, referred to the following lines, which sadly do not say which llvm version exhibited the issue.

-- Bug in LLVM at O3 on OSX.
llvmOpts = if platformOS (targetPlatform dflags) == OSDarwin
           then ["-O1", "-O2", "-O2"]
           else ["-O1", "-O2", "-O3"]

I'm proposing to take this opportunity and start from a blank slate and drop any maybe it's still broken, maybe not parts from the pipeline.

comment:20 Changed 9 months ago by awson

Ah, it looks so much happened under the hood which I wasn't aware of!

A couple of comments and answers then:

  1. -flto doesn't work on windows even on the current llvm5 and won't in the foreseeable future, because it requires GOLD linker plugin to work on unices, and we have neither on windows.
  2. AFAIUI, Matthias Braun's early advice to use clang driver was mostly inspired by his ignorance of how different STG execution model is from that of C, later he understood this and stated that since we need -fllvm anyway, i.e. need to bypass clang's "high-level" predefined -OX sets of options then either using clang or opt/llc drivers is "equally good/bad" in our use case.
  3. Btw, why can't we simply do ghc -> llvm ir -> clang -> mangler -> as -> object if we still need the mangler? Or we can but *don't want*?
  4. Even if not using clang, a part of your patches in https://phabricator.haskell.org/D3352 still looks relevant, e.g. we can drop pprLlvmHeader/moduleLayout thingy since it is inferred by LLVM tools from module target triple anyway.

comment:21 Changed 9 months ago by angerman

  • if lto depends on gold, than this will clearly only work on ELF based systems, I'm not just if lld would solve this though, it's supposed to be somewhat stable already.
  • I don't see the STG/C difference in the emails. I might be not reading something right though. Yes if we want exact control over opt and llc, which we can't get through clang, we will need to revert back to those tools. I however would prefer not to. That clang or opt/llc are equally suboptimal is certainly correct. I would argue that one tool is preferable over two tools, unless we find actual usecases we can't achieve with that single tool (and can not subsequently convince the llvm people that our usecase is legit.).
  • Yes, we could ask clang to output assembly, (or bitcode if we wanted to use llvms bitcode linkter), and use clang as the assembler as well. I simply want to get rid of the mangler if possible (see also #11138); right now we use the mangler for three things: a) avx mangling, which I'm not certain we still need, and if we need it we should figure out why and fix it in llvm upstream. b) function/object rewrites, which I'm suspicious of as well (see https://reviews.llvm.org/D30812) and c) the -dead_strip fix, which we do not need with llvm5 or a patched llvm4 anymore and only affects mach-o based systems (iOS, macOS, ...) anyway.
  • Dropping the dreaded module layout / header logic was a long time goal of mine, as it is not only painful to keep those up to date, and I'm not even sure we have the proper values.

As the Imrpoved LLVM Proposal was about bundling llvm with ghc, to have better control over the llvm backend, bundling clang (or if we really must opt+llc) looks to me like the way to go. Ideally though, I'd prefer to find all necessary fixes we need in llvm and have them upstreamed in llvm5, such that ghc8.4 can simply require llvm5. However I'm not opposed to laying the foundation to bundle clang with ghc, should the need arise.

comment:22 Changed 8 months ago by George

One interesting option clang would give us is to specify -Os, which according to the clang man page is like the clang -O2 option but with extra optimizations to reduce code size. opt and llc don't seem to have that option. Of course this wouldn't help the ghc code gen but might help with clang as the -O3 option along with -Os would allow users to experiment with size/speeed tradeoffs.

I have seen discussions about not doing certain optimizations as the code size would be increased and we were unsure if the result would be faster or not.

OTOH I don't know very much and maybe all the important code size / speed tradeoffs are made before we get to llvm.

comment:23 Changed 8 months ago by George

Cc: George added

comment:24 Changed 8 months ago by George

Cc: george added; George removed

comment:25 Changed 8 months ago by kavon

I personally don't see much of a benefit in moving from llc/opt to clang just to merge the interface to LLVM.

The only benefit I can imagine is that we can skip the step of having opt generate a .bc and feeding that into llc, though perhaps we can just pipe opt's output directly to llc? This might save some compile time.

Otherwise, we can build our own opt & llc and include those if we needed a custom version of LLVM... which is one of my questions, which particular patches for LLVM are needed right now, or is this a preparation step?

Off the top of my head, I think the downsides of switching to clang as our interface to LLVM are:

  • clang limits us to using only the -Ox passes, which are tuned for C/C++. There's likely some benefit to crafting/tuning our own sequence of optimization passes (which is on my todo list). There are also IR passes in LLVM that are not included in the default -Ox sets that could prove useful when tuning.
  • Special llc flags such as specifying a special stack alignment might be out of reach (not sure how well clang handles options for opt/llc)
  • AFAIK unoptimized GHC builds prioritize compilation time, running essentially just mem2reg to introduce phi-nodes before handing off to llc. I'm not sure if we can get such customizability with clang. Furthermore, I'm worried that clang's -O0 option will choose llc's -O0 option, which is a _very_ naive register allocator that I believe was meant to aid debugging, because it barely tries to keep values in register from what I've seen it produce. Decoupling LLVM IR optimization and LLVM IR machine code generation lets us still pick a decent register allocator, for example.

As an FYI, I'm currently working on the LLVM backend too, namely, I'm working on removing proc-point splitting when using LLVM.

comment:26 Changed 8 months ago by kavon

Cc: kavon added

comment:27 Changed 8 months ago by angerman

My current plan is to do [opt +] clang, and having opt disabled by default, until someone has the time to investigate different opt configurations. Piping opt to llc, would still incur serialization and deserialization overhead. My current standpoint is, that unless we explicitly gain something by using opt + clang or opt + llc right now, I'd rather have a dumb and simple solution.

Regarding LTO, I believe we can do LTO at the bitcode level with llvm-link and opt. However this would need to be explored, and until done so, I'd rather have a simple solution.

The opt+llc -> clang diff I posted to phabricator clamps the -O in [1,3]. Such that we always get mem2reg; the current design doesn't allow to trivially pass -Os I'm afraid, as ghc expects -O to be numeric.

In general clang doesn't really handle opt or llc flags or passes them down properly. There are supposed to be some escape hatches but afaict they do not cover all cases. Thus we'd be left with what clang offers. Yet again I'd like to stress the point that someone would need to seriously take ownership of the opt/llc code, ensure that all the hacks are still necessary, that opt and llc flags match up, and ensure that they work with new llvm version.

While bundling llvm would lessen the need to care for the latter, I imagine we'd still want to upgrade llvm from time to time?

What I hope to be able to complete this year is:

  • simplify the llvm logic: replacing opt+llc with opt+clang or clang for the time being
  • integrating the Data.BitCode modules to directly generate BitCode IR (without so much aliasing) from GHC
  • try to see if we can teach GHC that BitCode is a valid object like format.

The last point being essential to use llvm-link.

comment:28 Changed 8 months ago by angerman

I forgot to mention that getting rid of the mangler is also essential in a pure BitCode pipeline, as the mangler operates at the assembly level.

comment:29 Changed 8 months ago by kavon

Yet again I'd like to stress the point that someone would need to seriously take ownership of the opt/llc code, ensure that all the hacks are still necessary, that opt and llc flags match up, and ensure that they work with new llvm version.

I'll take ownership of all of these things!

As the Imrpoved LLVM Proposal was about bundling llvm with ghc, to have better control over the llvm backend, bundling clang (or if we really must opt+llc) looks to me like the way to go.

Bundling a customized clang is not the way to go. Clang is just the C/C++ frontend for LLVM, and we lose control over LLVM by trying to access it through clang... how will we use any of our customizations? If we want to write our own IR optimization/analysis passes, we would end up exposing the flag to run it via opt... hacking up clang to access the pass is much harder!

I also would like to stress that we cannot rely on the system's version of clang. For example, on OS X the default clang is built against Apple's LLVM, whose source code is unknown. The opt/llc obtained by package managers are always the open-source version.

Regarding LTO, I believe we can do LTO at the bitcode level with llvm-link and opt.

Yes, I'm quite certain that's all you do, and I'd be willing to look into this. It really shouldn't be too difficult.

---

Overall, I still don't see good motivation for moving to opt/clang instead of opt/llc, other than an attempt to reduce compilation time. If I missed something in the prior discussion please forgive me.

If compilation time with LLVM is very important, I think there are more profitable ways of reducing it than using clang:

Here are the timings I'm seeing on a GHC produced 2.6MB LLVM IR file (the Move module from the mate benchmark) with a Debug build of LLVM 5 with assertions on, so these times are inflated in unknown ways:

opt -time-passes -O1 Move.ll | llc -O1 -time-passes -o blah.s

1.08 seconds were spent by opt parsing the textual LLVM IR we generated. opt spent 0.19 seconds emitting bitcode, and llc spent 0.16 seconds parsing it, so 0.35 seconds between opt and llc.

The total time spent to complete that pipeline was 18.15 seconds, so 2% of this time is owed to bitcode serialization between opt and llc, whereas 6% is owed to parsing textual LLVM IR from GHC. This doesn't include the time spent by GHC emitting the textual IR too!

Thus, to reduce compile times, I think it would make more sense to either generate LLVM bitcode, or switch to using Haskell bindings for LLVM to access the API directly. Neither of these are small tasks, but I think they're better for us in the long-run.

comment:30 Changed 8 months ago by angerman

Well, I'm all for someone to ensure that opt and llc work. That's fantastic!

I never intend to actually customize much of clang, opt or llc. However if we run into cases again where we have upstream patches that are needed, and have even been merged, but there has no llvm release been cut, we will need to have some form of customized interim binary.

As you said it is unknown if apples clang *is* identical or not and which customization have been applied. I would still hope we will be able to ideally use the systems provided clang at some point. Which of course won't work if we need opt, and that is not provided. If we end up flat our refusing to use apples clang on ideological grounds, and apple ends up customizing their clang to the point where you are required to use their clang, or won't be able to take play in their walled garden, do we want to make that sacrifice? This of course is hypothetical.

I would though stay away from ever generating any assembly, and have llvm produce object code directly, either via opt+llc, or opt+clang or clang. Ideally it should be

.bc -> [llvm blackbox which allows us to pass flags where and how we need it] -> .o

Switching to haskell bindings for LLVM has been no option for various reasons. I don't know if this stand has ever changed. Due to this though, I do have a pure haskell bitcode generator (https://github.com/angerman/data-bitcode, https://github.com/angerman/data-bitcode-llvm, ...) these are by no means complete, the plugin I have can compile and link trivial haskell programs; I do plan on integrating them into ghc, without the plugin approach, as that requires a significant amount of rewriting the plugin interface in ghc.

comment:31 Changed 2 months ago by alpmestan

Cc: alpmestan added

comment:32 Changed 6 days ago by George

Is the plan to remove proc-point splitting for 8.4.1?

comment:33 in reply to:  32 Changed 4 days ago by kavon

Replying to George:

Is the plan to remove proc-point splitting for 8.4.1?

It's more likely for 8.6 or later. We have to align with the release cycle for LLVM as the removal depends on which version of LLVM the patch lands in. I'll put together a Trac ticket to track progress.

comment:34 Changed 26 hours ago by George

Cc: George added; george removed
Note: See TracTickets for help on using tickets.