Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#10682 closed bug (fixed)

AArch64: dll-split: out of memory (requested 1099512676352 bytes)

Reported by: erikd Owned by:
Priority: normal Milestone: 8.0.1
Component: Compiler Version: 7.11
Keywords: Cc: kgardas, RyanGlScott
Operating System: Unknown/Multiple Architecture: aarch64
Type of failure: Building GHC failed Test Case:
Blocked By: Blocking:
Related Tickets: #10877 Differential Rev(s): Phab:D1171
Wiki Page:

Description

Some time in the last week, GHC Git HEAD started to fail to build with:

dll-split: out of memory (requested 1099512676352 bytes)
compiler/ghc.mk:655: recipe for target 'compiler/stage2/dll-split.stamp' failed

Obviously, attempting to allocate a terrabyte is not going to work.

Will try to git bisect.

Change History (19)

comment:1 Changed 2 years ago by erikd

The value 1099512676352 in hex is 0x10000100000.

Last edited 2 years ago by erikd (previous) (diff)

comment:2 Changed 2 years ago by rwbarton

Oh, it's probably Phab:D524 (#9706) then.

comment:3 Changed 2 years ago by erikd

Confirmed. Commit 0d1a8d09f4 broke it.

Last edited 2 years ago by erikd (previous) (diff)

comment:4 Changed 2 years ago by erikd

If I hack configure.ac and disable USE_LARGE_ADDRESS_SPACE GHC builds fine.

Need to figure out why Arm64 doesn't support this.

comment:5 Changed 2 years ago by erikd

I added debug to rts/posix/OSMem.c to print out the values that were being passed to mmap. I then wrote a simple standalone program to see of mmap still failed with the same parameters outside of GHC. The program is:

#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>

int
main (void)
{	void *addr = (void*) 0x200000000, *ret ;
	size_t size = 1099512676352 ;
	int prot = 0, flags = MAP_NORESERVE | MAP_ANON | MAP_PRIVATE ;

	ret = mmap(addr, size, prot, flags, -1, 0);
	if (ret == NULL) {
		perror ("mmap") ;
		exit (1) ;
	}

	puts ("Success!") ;
	return 0 ;
}

and it compiled and ran successfully on the same machine where GHC is currently failing.

comment:6 in reply to:  description Changed 2 years ago by kgardas

Cc: kgardas added

Replying to erikd:

Some time in the last week, GHC Git HEAD started to fail to build with:

dll-split: out of memory (requested 1099512676352 bytes)
compiler/ghc.mk:655: recipe for target 'compiler/stage2/dll-split.stamp' failed

Obviously, attempting to allocate a terrabyte is not going to work.

Will try to git bisect.

This is very similar to what I get here on amd64/solaris11 builder[1] after I've been able to start it after holidays. Perhaps the same issue? I'll try to debug that too as time permits.

[1]: http://haskell.inf.elte.hu/builders/solaris-amd64-head/index.html

comment:7 Changed 2 years ago by ezyang

First off, even if the symptoms are the same, I would recommend opening a separate bug for 64-bit Solaris 11 not working, because the underlying cause/fix are very unlikely to be the same.

According to some source comments in the Go project (https://golang.org/src/runtime/malloc.go#L158 and https://golang.org/src/runtime/malloc.go#L260), there are restrictions on how much virtual memory you can actually get on ARM64; in particular, apparently only 39 bits of user address space is allowed (which is about 500 GB). So it seems likely that if we halve the requested address size we might do better.

However, in that case, I don't understand why the mini test-program is working. Do you have the ability to strace executables on ARM64, to find out what the sequence of mmap calls are?

I don't think Giovanni or I have access to ARM64 machines which will make it a little harder for us to debug.

comment:8 Changed 2 years ago by kgardas

Edward, you are of course right. The symptoms may be the same but fix is completely different since this is heavily #ifdefed OS specific code anyway, so Solaris fix will touch different place anyway.

comment:9 Changed 2 years ago by erikd

On aarch64-linux, applying this patch:

diff --git a/rts/sm/HeapAlloc.h b/rts/sm/HeapAlloc.h
index c914b5d..90a55d1 100644
--- a/rts/sm/HeapAlloc.h
+++ b/rts/sm/HeapAlloc.h
@@ -52,7 +52,7 @@
 #ifdef USE_LARGE_ADDRESS_SPACE
 
 extern W_ mblock_address_space_begin;
-# define MBLOCK_SPACE_SIZE      ((StgWord)1 << 40) /* 1 TB */
+# define MBLOCK_SPACE_SIZE      ((StgWord)1 << 38) /* 1/4 TB */
 # define HEAP_ALLOCED(p)        ((W_)(p) >= mblock_address_space_begin && \
                                  (W_)(p) < (mblock_address_space_begin +  \
                                             MBLOCK_SPACE_SIZE))

fixes the build. @kgardas, does this fix amd64-solaris as well?

comment:10 Changed 2 years ago by erikd

@kgardas has an amd64-solaris fix at Phab:D1169.

comment:11 Changed 2 years ago by erikd

Differential Rev(s): Phab:D1171
Status: newpatch

comment:12 Changed 2 years ago by Erik de Castro Lopo <erikd@…>

In 38c98e4/ghc:

RTS: Reduce MBLOCK_SPACE_SIZE on AArch64

Commit 0d1a8d09f4 added a two step allocator for 64 bit systems. This
allocator mmaps a huge (1 TB) chunk of memory out of which it does
smaller allocations. On AArch64/Arm64 linux, this mmap was failing
due to the Arm64 Linux kernel parameter CONFIG_ARM64_VA_BITS
defaulting to 39 bits.

Therefore reducing the AArch64 value for MBLOCK_SPACE_SIZE to make
this allocation 1/4 TB while remaining 1 TB for other archs.

Reviewers: ezyang, austin, bgamari

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D1171

GHC Trac Issues: #10682

comment:13 Changed 2 years ago by RyanGlScott

Cc: RyanGlScott added

comment:14 Changed 2 years ago by thoughtpolice

Milestone: 7.12.18.0.1

Milestone renamed

comment:15 Changed 2 years ago by rwbarton

So, this is fixed right?

comment:16 Changed 2 years ago by erikd

Resolution: fixed
Status: patchclosed

Sorry, yes it is. Closing.

comment:17 Changed 2 years ago by RyanGlScott

This is still happening to me on an x86_64 machine:

inplace/bin/dll-split compiler/stage2/build/.depend-v-dyn.haskell "DynFlags" "Annotations ApiAnnotation Avail Bag BasicTypes Binary BooleanFormula BreakArray BufWrite Class CmdLineParser CmmType CoAxiom ConLike Coercion Config Constants CoreArity CoreFVs CoreSubst CoreSyn CoreTidy CoreUnfold CoreUtils CoreSeq CoreStats CostCentre Ctype DataCon Demand Digraph DriverPhases DynFlags Encoding ErrUtils Exception FamInstEnv FastFunctions FastMutInt FastString Fingerprint FiniteMap ForeignCall Hooks HsBinds HsDecls HsDoc HsExpr HsImpExp HsLit PlaceHolder HsPat HsSyn HsTypes HsUtils HscTypes IOEnv Id IdInfo IfaceSyn IfaceType InstEnv Kind Lexeme Lexer ListSetOps Literal Maybes MkCore MkId Module MonadUtils Name NameEnv NameSet OccName OccurAnal OptCoercion OrdList Outputable PackageConfig Packages Pair Panic PatSyn PipelineMonad Platform PlatformConstants PprCore PrelNames PrelRules Pretty PrimOp RdrName Rules Serialized SrcLoc StaticFlags StringBuffer TcEvidence TcRnTypes TcType TrieMap TyCon Type TypeRep TysPrim TysWiredIn Unify UniqFM UniqSet UniqSupply Unique Util Var VarEnv VarSet Bitmap BlockId ByteCodeAsm ByteCodeInstr ByteCodeItbls CLabel Cmm CmmCallConv CmmExpr CmmInfo CmmMachOp CmmNode CmmSwitch CmmUtils CodeGen.Platform CodeGen.Platform.ARM CodeGen.Platform.ARM64 CodeGen.Platform.NoRegs CodeGen.Platform.PPC CodeGen.Platform.PPC_Darwin CodeGen.Platform.SPARC CodeGen.Platform.X86 CodeGen.Platform.X86_64 Hoopl Hoopl.Dataflow InteractiveEvalTypes MkGraph PprCmm PprCmmDecl PprCmmExpr Reg RegClass SMRep StgCmmArgRep StgCmmClosure StgCmmEnv StgCmmLayout StgCmmMonad StgCmmProf StgCmmTicky StgCmmUtils StgSyn Stream"
dll-split: out of memory (requested 1099512676352 bytes)

comment:18 Changed 2 years ago by rwbarton

This ticket is about aarch64, could you file another ticket?

comment:19 Changed 2 years ago by RyanGlScott

Note: See TracTickets for help on using tickets.