Currently the Windows CI builds are typically lasting over 3 hours. The primary cause of this appears to be poor performance in ld.bfd, especially when linking testsuite tests.
One option to fix this is to try using LLD for linking. Unfortunately the msys2 gcc does not support -fuse-ld=lld.
Trac metadata
Trac field
Value
Version
8.6.3
Type
Bug
TypeOfFailure
OtherFailure
Priority
high
Resolution
Unresolved
Component
Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
Edited
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
One option to fix this is to try using LLD for linking. Unfortunately the msys2 gcc does not support -fuse-ld=lld.
LLD won't magically help here. Latest improvements by Martin Storsjö (mstorsjo) have made LLD able to link some mingw **gcc**-generated code, but, alas, when assembling **GHC**-generated assembly, mingw **gas** produces (I believe this is a sort of "optimisation") non-standard relocations, which LLD is unable to deal with (honestly, I haven't checked if this is still the case, but it was definitely so a year or two ago).
For quite a bit of time I have a GHC port which works flawlessly against native Windows SDK and uses clang in MSVC mode and MS or LLD linker. After recent LLD improvements by mstorsjo I've decided to try a less intrusive approach — use clang in mingw mode and LLD linker. That required quite a bit of work and I finally managed to produce stage2 GHC executable, which appeared to be severely broken — it is unable to do anything.
TL;DR the current state of affairs is that lld **can't** serve as a drop-in replacement for ld not only when GNU binutils toolchain is used, but even when clang is used as an assembler.
LLD won't magically help here. Latest improvements by Martin Storsjö (mstorsjo) have made LLD able to link some mingw gcc-generated code, but, alas, when assembling GHC-generated assembly, mingw gas produces (I believe this is a sort of "optimisation") non-standard relocations.
Frankly I wonder how difficult it would be to add support for these relocations. Given that LLVM code is generally pretty approachable I suspect that would be the easiest path forward.
In the meantime I have been pursuing testing Tamar's binutils patch.
I think it's easy, but (as I've already mentioned) the other problems exist. Mingw binutils introduce a lot of non-standard things, which LLD didn't support at all until recently, now some support have appeared, but AFAIUI, things go best when mingw sdk is built by LLVM toolsuite, not by mingw gcc/binutils.
And a general problem is that wrong linkage bugs are *very* hard to debug, this is why I haven't ever tried to continue my mingw clang/lld experiment — it might consume an unpredictable amount of time to understand what is wrong with the generated ghc executable — all the symptoms are that the image file is invalid, since the OS can't even load it properly.
Frankly I wonder how difficult it would be to add support for these relocations. Given that LLVM code is generally pretty approachable I suspect that would be the easiest path forward.
In the meantime I have been pursuing testing Tamar's binutils patch.
Speed linking
As you are well aware, linking speed is an issue. In principle BFD shouldn't be so much slower on linux than on Windows, because most of the code is generic mid-end code! so it's shared.
The platform ABI differences don't account for this slowdown. My working hypothesis is that the slowdown is at the two ends of the linker. The file read and file write. My suspicion is
that the Windows version has to do much more work than the linux one. A fundamental differce between linux and Windows file I/O is that linux I/O is optimized for path based APIs while
Windows is Handle based. CreateFile is a relatively expensive API to perform, so everytime these POSIX functions are called on windows the file is opened and closed and you pay this
expensive overhead. One thing I will try is changing the fread/fopen to mmap calls. bypassing the buffer managers etc. I'm hoping this makes up the differce.
Case in point that it's not an inherent platform issue is that LLD and link.exe are both much faster than ld.
In an act of desperation (one can only take so many 4 hour builds before going mad) I tried plugging ld.ldd (from the official LLVM 9 distribution) into GHC. Specifically, I took a working tree and modified the settings file with:
("ld is GNU ld","NO")
("ld command", "ld.lld")
("C compiler link flags", "-fuse-ld=lld")
and ripping out the hard-coded ld.bfd flags in `compiler/utils/SysTools/Tasks.hs:
diff --git a/compiler/main/SysTools/Info.hs b/compiler/main/SysTools/Info.hsindex e61846d4e6..9c56d2a9cd 100644--- a/compiler/main/SysTools/Info.hs+++ b/compiler/main/SysTools/Info.hs@@ -173,12 +173,9 @@ getLinkerInfo' dflags = do -- Process creation is also fairly expensive on win32, so -- we short-circuit here. return $ GnuLD $ map Option- [ -- Reduce ld memory usage- "-Wl,--hash-size=31"- , "-Wl,--reduce-memory-overheads"- -- Emit gcc stack checks+ [ -- Emit gcc stack checks -- Note [Windows stack usage]- , "-fstack-check"+ "-fstack-check" -- Force static linking of libGCC -- Note [Windows static libGCC] , "-static-libgcc" ]
Unfortunately, this didn't get very far. The problem can be easily demonstrated with plain gcc:
PE/COFF LLD doesn't support -r (partially linked object file) output, it can't produce COFF output, it's able to create PE output only.
I believe it's possible to use GNU ld when doing -r (GHC uses it when packing some C stubs code back into object files, and also cabal uses it to produce prelinked GHCi object files for packages), and LLD when linking final executables (though I think non-standard relocations still aren't supported by LLVM suite).
Alright, so it sounds like the limitation to only produce PE output isn't something that is going to change in the near future. Given that a migration to lld would also mean fighting with relocations, it seems like we might be better off focusing on fixing bfd. Its performance is so bad that the problem must appear clear as day in a profile. @AndreasK, would you be able to try running such a profile on the linker when you get a chance? Unfortunately this may require building binutils from scratch since I suspect that the msys binaries are stripped of debug information.
For what it's worth, the linker used for the final link and the linker used for object merging are now configurable independently (see !3798 (closed)), meaning that we could easily use ld.bfd for one and lld for the other.
currently ghc-8.8.3 is able to use clang as the assembler and the c compiler (via -pgmc and -pgma), but not as the C pre-processor (via -pgmP) with some warnings.
In file included from C:\...\AppData\Local\Temp\ghc13684_0\ghc_3.c:1:In file included from C:\...\AppData\Local\Programs\stack\x86_64-windows\ghc-8.8.3\lib/include\Rts.h:29:In file included from C:\...\AppData\Local\Programs\stack\x86_64-windows\ghc-8.8.3\lib/include/Stg.h:233:C:\...\AppData\Local\Programs\stack\x86_64-windows\ghc-8.8.3\lib/include/stg/Types.h:25:9: warning: '__USE_MINGW_ANSI_STDIO' macro redefined [-Wmacro-redefined]#define __USE_MINGW_ANSI_STDIO 1 ^D:\msys64\mingw64\x86_64-w64-mingw32\include\_mingw.h:435:9: note: previous definition is here#define __USE_MINGW_ANSI_STDIO 0 /* was not defined so it should be 0 */ ^In file included from C:\...\AppData\Local\Temp\ghc13684_0\ghc_3.c:1:In file included from C:\...\AppData\Local\Programs\stack\x86_64-windows\ghc-8.8.3\lib/include\Rts.h:179:C:\...\AppData\Local\Programs\stack\x86_64-windows\ghc-8.8.3\lib/include/rts/Messages.h:44:20: warning: 'format' attribute argument not supported: gnu_printf [-Wignored-attributes] GNUC3_ATTRIBUTE(format(PRINTF, 1, 2)); ^C:\...\AppData\Local\Programs\stack\x86_64-windows\ghc-8.8.3\lib/include/rts/Messages.h:60:20: warning: 'format' attribute argument not supported: gnu_printf [-Wignored-attributes] GNUC3_ATTRIBUTE(format (PRINTF, 1, 2)); ^C:\...\AppData\Local\Programs\stack\x86_64-windows\ghc-8.8.3\lib/include/rts/Messages.h:74:20: warning: 'format' attribute argument not supported: gnu_printf [-Wignored-attributes] GNUC3_ATTRIBUTE(format (PRINTF, 1, 2)); ^C:\...\AppData\Local\Programs\stack\x86_64-windows\ghc-8.8.3\lib/include/rts/Messages.h:86:20: warning: 'format' attribute argument not supported: gnu_printf [-Wignored-attributes] GNUC3_ATTRIBUTE(format (PRINTF, 1, 2)); ^5 warnings generated.Warning: corrupt .drectve at end of def fileWarning: corrupt .drectve at end of def fileWarning: corrupt .drectve at end of def file
the warnings from the c files are documented here and here, however, all this only works when you pass -fllvm as well. but when attempting to use clang as the linker (via -pgml) i get a linker error because ghc unconidtionally passes ld-only flags to the linker on windows. by manually upgrading the version of mingw that ships with ghc to one that comes with a gcc that supports -fuse-ld=lld (gcc 9.3.0) i am able to use lld with gcc as well, but i still get the same linker error.
while i feel that improving the speed of bfd is good idea, i've found in practice that switching to linking with lld has greatly improved my compile times (essentially for free), given mingw ld's notorious lack of speed.
on a side note, ld.lld on windows produces elf executables not windows ones. to produce windows executables you need to use use lld-link which accepts msvc style linker commands, but if you pass -lldmingw it allows to pass in arguments in a more mingw/unix-like fashion.
After getting tired of waiting for Windows builds over the last few days I tried this again. The hint of using -lldmingw is a very good one; unfortunately, getting GCC to call lld-link (instead of ld.lld) seems to be nontrivial. I suspect using clang instead of gcc would also help, although this too is a bit tricky. Regardless, making all of this workable in a shippable compiler seems quite challengingb.
I think the code in that section is highly questionable. We should look at the windows linker invocations though to see what actually happens. For macOS it polluted the DYLD_LIBRARY_PATHS, which effectively made the system linker spend a lot of time trying to stat (and get the contents of) phony directories. The red flag was 80% time spent in the kernel during test runs. That appeared way too high.
If we pass too much useless folders to the linker on windows that might cause some (potentially quadratic) slowdown, but I don't know if we do.
Relocation 0x11 is R_X86_64_32S, which isn't technically a PE relocation but is apparently produced by gas due to a bug. See #9907 (closed) and Note [ELF constant in PE file].