#11395 closed bug (fixed)

The via-C code generation backend is incompatible with gcc 5.3.1 on m68k (ELF)

Reported by: mkarcher Owned by:
Priority: normal Milestone: 8.0.1
Component: Compiler Version: 7.10.2
Keywords: Cc: slyfox
Operating System: Linux Architecture: m68k
Type of failure: GHC doesn't work at all Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

ghc is generating C code like in the following simplified example:

demo1.c:

#include <assert.h>

extern void* int_returning_function();

typedef int (*int_f_ptr)();

int main(void)
{
	int result;
	{
		int_f_ptr correctly_typed_pointer;
		correctly_typed_pointer = (int_f_ptr)int_returning_function;
		result = correctly_typed_pointer();
	}
	assert(result == 42);
	return 0;
}

demo2.c:

int int_returning_function()
{
	return 42;
}

This code fails to work on m68k with gcc-5.3.1 if optimization is turned on. The real-life example from ghc is rts/Schedule.c:raiseExceptionHelper as int_returning_function and rts/Exception.cmm:stg_raisezh as main. The reason it fails is that on m68k, a function returning a pointer returns the result in %a0, while a function returning an integer returns it in %d0. If optimization is enabled, gcc elides the function pointer and calls the function as-if it were called directly. Without optimization, gcc expects the result to be in %d0, where it actually is.

In my oppinion, gcc is allowed to do that, because according to N843 (C99 draft), 6.2.7 #2 "All declarations that refer to the same object or function shall have compatible type; otherwise, the behavior is undefined." is violated in this example. Compatibility of return types is defined in 6.7.5.3 #11 "For two function types to be compatible, both shall specify compatible return types. Moreover, [...]" which is clearly violated.

If you just look at demo1.c and assume that 6.2.7 #2 is *not* violated, int_returning_function would in fact be a function returning a pointer. In that case, compiling demo1.c would violate 6.3.2.3 #8 "If a converted pointer is used to call a function whose type is not compatible with the pointed-to type, the behavior is undefined." and/or 6.5.2.2 "If the function is defined with a type that is not compatible with the type (of the expression) pointed to by the expression that denotes the called function, the behavior is undefined."

You might want to discuss this issue with the gcc developers, but I don't see a standard claus justifying the approach of declaring all functions returning StgFunPtr and casting them to the right type.

If you cross-compile the source tarball Debian claims to be 7.10.3-rc1 to m68k using gcc 5.3.1, ghc crashes on startup because stg_raisezh gets a wrong value as frame_type and tries to cancel an STM transaction that does not exist at all.

Attachments (2)

0001-Extend-gcc-on-m68k-to-allow-calling-incorrectly-decl.patch (4.3 KB) - added by mkarcher 20 months ago.
Patch for gcc to support the non-standard construct used by ghc on m68k.
0001-c-codegen-split-external-symbol-prototypes-EF_.patch (4.2 KB) - added by slyfox 19 months ago.
0001-c-codegen-split-external-symbol-prototypes-EF_.patch

Download all attachments as: .zip

Change History (30)

comment:1 Changed 21 months ago by slyfox

Cc: slyfox added

changeset:5a31f231eebfb8140f9b519b166094d9d4fc2d79 and changeset:31a7bb463b6a3e99ede6de994c1f449c43a9118c dealt with similar issue on ELFv2 PPC64 but for passed arguments for a function.

There is a way to detect a prototype skew by building ghc with CFLAGS/LDFLAGS=-flto. With this hack i've fixed a few SIGSEGVs on ia64 where pointers differ from function pointers (like changeset:d82f592522eb8e063276a8a8c87ab93e18353c6b)

GHC still has a lot of mismatching prototypes though.

comment:2 Changed 21 months ago by slyfox

my mk/build.mk for the LTO test is:

CPUS=9
SRC_CC_OPTS += -flto=$(CPUS) -ffat-lto-objects
SRC_LD_OPTS += -flto=$(CPUS) -ffat-lto-objects
SRC_HC_OPTS += -optc-flto=$(CPUS) -optc-ffat-lto-objects
SRC_HC_OPTS += -optl-flto=$(CPUS) -optl-ffat-lto-objects

comment:3 Changed 21 months ago by slyfox

To get generated code we can run:

"inplace/bin/ghc-stage1" \
    -static  -H64m -O1 \
    -optc-flto=9 -optc-ffat-lto-objects \
    -optl-flto=9 -optl-ffat-lto-objects \
    -optc-ggdb2 -optl-ggdb2 \
    -j4 \
    -optc-Werror=implicit-function-declaration -optc-Werror=overflow \
    -Wall  \
    -lbfd \
    -Iincludes -Iincludes/dist -Iincludes/dist-derivedconstants/header \
    -Iincludes/dist-ghcconstants/header -Irts -Irts/dist/build \
    -DCOMPILING_RTS -this-package-key rts \
    -optc-DNOSMP -dcmm-lint \
    -i -irts -irts/dist/build \
    -irts/dist/build/autogen \
    -Irts/dist/build -Irts/dist/build/autogen \
    -O2 \
    -C rts/Exception.cmm \
    -o rts/dist/build/Exception.hc

rts/dist/build/Exception.hc contains the following declarations:

/* (incomplete) symbol declaraion */
EF_(raiseExceptionHelper);
...
/* exact prototype dectaration, assignment and call: */
{
    W_ (*ghcFunPtr)(void *, void *, void *);
    ghcFunPtr = ((W_ (*)(void *, void *, void *))raiseExceptionHelper);
    _c2y = ghcFunPtr((void *)(W_)BaseReg, (void *)(W_)CurrentTSO, (void *)_c2z);;}

It comes from rts/Exception.cmm:

retry_pop_stack:
    SAVE_THREAD_STATE();
    (frame_type) = ccall raiseExceptionHelper(BaseReg "ptr", CurrentTSO "ptr", exception "ptr");
    LOAD_THREAD_STATE();

As we see unannotated return type defaults to W_ (StgWord, non-pointer type).

So the question is:

Does gcc ignore type coercion given in ghcFunPtr = assignment and uses original declaration? gcc's -fdump-*-all options might hint at where gcc loses information about argument type.

Could it be real (but similar) error happens somewhere else?

comment:4 Changed 21 months ago by slyfox

One more note:

The bug you describe should surface on amd64 if you change return type from 'int' (%rax) to 'double' (%xmm0) in your example, correct? On gcc-5.3.0/amd64 it seems to work as expected:

        call    double_returning_function
        cvttsd2si       %xmm0, %eax
        cmpl    $42, %eax

comment:5 Changed 21 months ago by slyfox

Filed a bug upstream to investigate further and/or find a workaround:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69221

comment:6 Changed 21 months ago by mkarcher

I can confirm your obvservation that on amd64, gcc 5.3.1 does indeed take the type of the function pointer into account when it decides whether the return value is in %eax (for int), %rax (for void*) or %xmm0 (for double). It's interesting that gcc doesn't do it on m68k.

comment:7 Changed 21 months ago by mkarcher

It seems gcc introduces a temporary double variable in the double/int mismatch on amd64, because the double-to-int conversion is a semantic conversion changing the representation. On the other hand, the void*-to-int conversion is reinterpreting the same value as a different type without the need of a conversion instruction (like cvttsd2si in the amd64 case).

comment:8 in reply to:  5 Changed 21 months ago by glaubitz

Replying to slyfox:

Filed a bug upstream to investigate further and/or find a workaround:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69221

Hmm, looks like gcc upstream doesn't agree it's actually a gcc problem.

comment:9 Changed 21 months ago by rwbarton

Could we, instead of

EF_(raiseExceptionHelper);
...
/* exact prototype dectaration, assignment and call: */
{
    W_ (*ghcFunPtr)(void *, void *, void *);
    ghcFunPtr = ((W_ (*)(void *, void *, void *))raiseExceptionHelper);

generate a local extern function declaration, like

{
    W_ (*ghcFunPtr)(void *, void *, void *);
    extern W_ (*raiseExceptionHelper)(void *, void *, void *);
    ghcFunPtr = ((W_ (*)(void *, void *, void *))raiseExceptionHelper);

(sorry if I got the syntax wrong)?

Version 0, edited 21 months ago by rwbarton (next)

comment:10 Changed 21 months ago by mkarcher

rwbarton: The possibility to use a local extern function declaration to avoid the function pointer hack occurred to me, too. I don't know enough about ghc internals to determine whether it is viable. You might try to go without ghcFunPtr, if possible. Like generating

{
   extern W_ raiseExceptionHelper(void *, void *, void *);
   raiseExceptionHelper((void*)((W_)BaseReg, (void*)(W_)CurrentTSO, (void*)_c2z);;}

instead of

{
    W_ (*ghcFunPtr)(void *, void *, void *);
    ghcFunPtr = ((W_ (*)(void *, void *, void *))raiseExceptionHelper);
    _c2y = ghcFunPtr((void *)(W_)BaseReg, (void *)(W_)CurrentTSO, (void *)_c2z);;}

comment:11 Changed 21 months ago by slyfox

Yeah, I think local extern should work fine (and be cleaner!).

comment:12 Changed 20 months ago by glaubitz

Another m68k porter showed me today that this bug isn't unique to ghcb but was also found in Erlang:

Wouter's analysis confirms what has being sad in this bug report. The code in question is basically violating the C standard and the fact that it doesn't result in problems is just luck.

Who knows we might have new architectures or compilers in the future which will behave differently and therefore this bug might become visible again unless it is fixed.

Last edited 20 months ago by glaubitz (previous) (diff)

Changed 20 months ago by mkarcher

Patch for gcc to support the non-standard construct used by ghc on m68k.

Changed 19 months ago by slyfox

0001-c-codegen-split-external-symbol-prototypes-EF_.patch

comment:13 Changed 19 months ago by slyfox

Please try the 0001-c-codegen-split-external-symbol-prototypes-EF_.patch patch on vanilla gcc.

I hope it will workaround a problem on m68k without breaking other platforms.

comment:14 in reply to:  13 Changed 19 months ago by glaubitz

Replying to slyfox:

Please try the 0001-c-codegen-split-external-symbol-prototypes-EF_.patch patch on vanilla gcc.

Thanks. I'll give it a shot and report back.

comment:15 Changed 19 months ago by glaubitz

Just cloned ghc from git, applied the patch and cross-compiled it.

Unfortunately, it segfaults right away:

root@pacman:~> ./ghc
Segmentation fault (core dumped)
root@pacman:~> file ./ghc
./ghc: ELF 32-bit MSB executable, Motorola m68k, 68020, version 1 (SYSV), dynamically linked, interpreter /lib/ld.so.1, for GNU/Linux 3.2.0, BuildID[sha1]=b7cc59d2d71bcb64fef73c1628602253f24bfa9f, not stripped
root@pacman:~>

Will remove the patch and test again, just to make sure it's not a general issue with ghc 8.x on m68k.

comment:16 Changed 19 months ago by slyfox

I've also tried to build m68k crosscompiler (using https://wiki.debian.org/M68k/sbuildQEMU to test produced binaries) and noticed GHC has the issue similar to:

http://bugs.python.org/issue17237

m68k aligns structs containing integers and pointers to 2 bytes (not 4 bytes). That makes StgClosure struct addresses look tagged even if they are not.

An example of GHC's startup crash. Test checks if untagging macro works:

Core was generated by `/tmp/a +RTS -Ds -Di -Dw -DG -Dg -Db -DS -Dt -Dp -Da -Dl -Dm -Dz -Dc -Dr'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x80463b0a in LOOKS_LIKE_INFO_PTR_NOT_NULL (p=32858) at includes/rts/storage/ClosureMacros.h:248
248         return (info->type != INVALID_OBJECT && info->type < N_CLOSURE_TYPES) ? rtsTrue : rtsFalse;
(gdb) bt
#0  0x80463b0a in LOOKS_LIKE_INFO_PTR_NOT_NULL (p=32858) at includes/rts/storage/ClosureMacros.h:248
#1  0x80463b46 in LOOKS_LIKE_INFO_PTR (p=32858) at includes/rts/storage/ClosureMacros.h:253
#2  0x80463b6c in LOOKS_LIKE_CLOSURE_PTR (p=0x805aac6e <stg_dummy_ret_closure>) at includes/rts/storage/ClosureMacros.h:258
#3  0x80463e4c in initStorage () at rts/sm/Storage.c:121
#4  0x8043ffb4 in hs_init_ghc (argc=0xf6ffebf0, argv=0xf6ffebf4, rts_config=...) at rts/RtsStartup.c:181
#5  0x80455982 in hs_main (argc=1, argv=0xf6ffed54, main_closure=0x804b8f70 <ZCMain_main_closure>, rts_config=...)
    at rts/RtsMain.c:51
#6  0x80003c1c in main ()

GHC assumes last 2 pointer bits are tags on 32-bit targets. But here 'stg_dummy_ret_closure' address violates the assumption:

LOOKS_LIKE_CLOSURE_PTR (p=0x805aac6e <stg_dummy_ret_closure>)

One more thing to tweak.

comment:17 Changed 19 months ago by slyfox

Sent the patch for review addressing comment 16: Phab:D1974

comment:18 Changed 19 months ago by slyfox

Sent slightly tweaked 0001-c-codegen-split-external-symbol-prototypes-EF_.patch for review as Phab:D1975

With both Phab:D1974 and Phab:D1975 I get hello world working on qemu-m68k in vanilla -dynamic and -prof modes.

Don't know about ghc itself as my emulator can't handle generated code yet:

$ inplace/bin/ghc-stage2 --make /tmp/a.hs /home/slyfox/dev/git/qemu-m68k/tcg/tcg.c:1774: tcg fatal error

comment:19 in reply to:  18 Changed 19 months ago by glaubitz

Replying to slyfox:

With both Phab:D1974 and Phab:D1975 I get hello world working on qemu-m68k in vanilla -dynamic and -prof modes.

Wow, I am very excited to hear this! Thanks a lot for investing your time into fixing this. Also, I was very happy to see another bug in qemu-m68k being squashed in the process.

Looking forward to test the changes once they have been merged.

comment:20 Changed 19 months ago by Sergei Trofimovich <siarheit@…>

In ade1a461/ghc:

Fix minimum alignment for StgClosure (Trac #11395)

The bug is observed on m68k-linux target as crash
in RTS:

    -- a.hs:
    main = print 43

    $ inplace/bin/ghc-stage1 --make -debug a.hs

    $ ./a
    Program terminated with signal SIGSEGV, Segmentation fault.
    #0  0x80463b0a in LOOKS_LIKE_INFO_PTR_NOT_NULL (p=32858)
        at includes/rts/storage/ClosureMacros.h:248
    (gdb) bt
    #0  0x80463b0a in LOOKS_LIKE_INFO_PTR_NOT_NULL (p=32858)
        at includes/rts/storage/ClosureMacros.h:248
    #1  0x80463b46 in LOOKS_LIKE_INFO_PTR (p=32858)
        at includes/rts/storage/ClosureMacros.h:253
    #2  0x80463b6c in LOOKS_LIKE_CLOSURE_PTR (
            p=0x805aac6e <stg_dummy_ret_closure>)
        at includes/rts/storage/ClosureMacros.h:258
    #3  0x80463e4c in initStorage ()
        at rts/sm/Storage.c:121
    #4  0x8043ffb4 in hs_init_ghc (...)
        at rts/RtsStartup.c:181
    #5  0x80455982 in hs_main (...)
        at rts/RtsMain.c:51
    #6  0x80003c1c in main ()

GHC assumes last 2 pointer bits are tags on 32-bit targets.
But here 'stg_dummy_ret_closure' address violates the assumption:

    LOOKS_LIKE_CLOSURE_PTR (p=0x805aac6e <stg_dummy_ret_closure>)

I've added compiler hint for static StgClosure objects to
align closures at least by their natural alignment (what GHC assumes).
See Note [StgWord alignment].

Signed-off-by: Sergei Trofimovich <siarheit@google.com>

Test Plan: ran basic test on m68k qemu, it got past ASSERTs

Reviewers: simonmar, austin, bgamari

Reviewed By: bgamari

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D1974

GHC Trac Issues: #11395

comment:21 Changed 19 months ago by slyfox

Filed https://github.com/vivier/qemu-m68k/issues/6 for possible qemu limitation to handle complex memory loads.

comment:22 Changed 19 months ago by Sergei Trofimovich <siarheit@…>

In 90e1e160/ghc:

Split external symbol prototypes (EF_) (Trac #11395)

Before the patch both Cmm and C symbols were declared
with 'EF_' macro:

    #define EF_(f)    extern StgFunPtr f()

but for Cmm symbols we know exact prototypes.

The patch splits there prototypes in to:

    #define EFF_(f)   void f() /* See Note [External function prototypes] */
    #define EF_(f)    StgFunPtr f(void)

Cmm functions are 'EF_' (External Functions),
C functions are 'EFF_' (External Foreign Functions).

While at it changed external C function prototype
to return 'void' to workaround ghc bug on m68k.
Described in detail in Trac #11395.

This makes simple tests work on m68k-linux target!

Thanks to Michael Karcher for awesome analysis
happening in Trac #11395.

Signed-off-by: Sergei Trofimovich <siarheit@google.com>

Test Plan: ran "hello world" on m68k successfully

Reviewers: simonmar, austin, bgamari

Reviewed By: bgamari

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D1975

GHC Trac Issues: #11395

comment:23 Changed 19 months ago by slyfox

The problem in comment:#21 was in ASL/LSL implementation:

https://github.com/vivier/qemu-m68k/pull/7 should fix it (lightly tested)

With Phab:D1975 applied ghc-stage2 still SIGSEGs.

Current run of regression tests fails the following ~200 tests:

$ make fasttest TEST_HC=$(pwd)/inplace/bin/ghc-stage1 THREADS=12
# testsuite/mk/ghcconfig_*.mk needs a fix in WORDSIZE, GHCI support

TEST="T7962 T9905fail3 T9905fail2 T9905fail1 ghc-e-fail2 ghc-e-fail1 annfail09 annfail03 T7859 cabal08 T1372 T5681 T8131b T7571 T9576 cabal09 cabal01 cabal03 cabal04 cabal05 cabal06 rnfail043 T3333 ghcilink001 ghcilink002 ghcilink003 ghcilink004 ghcilink005 ghcilink006 GShow1 T9045 linker_unload stack001 stack003 T5435_v_gcc bug1010 T5435_v_asm T9839_01 T10017 testmblockalloc T8124 T5250 T9329 T11103 overloadedrecfldsfail09 SafeLang16 SafeLang12 SafeLang01 T8628 T6145 T8639_api T10508_api literals parsed RAE_T32a T11195 T11303b T11303 T11276 T11374 UnsafeInfered02 UnsafeWarn02 UnsafeInfered12 T5550 T7702 T5321FD T5030 T4801 T5631 T5837 T9872a T5642 T9872b T3064 parsing001 T9872d T9872c T1969 T5321Fun T783 T9233 T9961 recomp009 ImpSafeOnly08 safePkg01 ImpSafeOnly02 ImpSafeOnly03 ImpSafeOnly01 ImpSafeOnly06 ImpSafeOnly07 ImpSafeOnly04 ImpSafeOnly05 ImpSafeOnly09 ImpSafeOnly10 T10052 prog001 T7022 T9160 AtomicPrimops tryReadMVar2 conc006 overloadedrecfldsrun04 overloadedlabelsrun04 T680 T5314 T949 prof-doc-last heapprof001 T3001-2 scc003 T10826 dynCompileExpr bug1465 ExtraConstraintsWildcardInPatternSplice NamedWildcardInTypeSplice posix004 ghci004 ghci006 landmines dynHelloWorld T9360a T9360b T5313 T7478 Conversions T3586 T5549 T7954 T5237 T3245 T4321 T7850 T9203 T10359 MethSharing integerGmpInternals dataToExpQUnit T3007 T1679 ffi014 TH_spliceViewPat T1407 load_short_name T6016 qsem001 hTell002 showDouble qsemn001 T4006 memo002 dynamic003 unicode002 stableptr003 stableptr004 T7600 cgrun058 cgrun074 cgrun026 encoding004 T3307 openTempFile001 environment001 T4855 num008 T11430 T10269 T10268 exampleTest T11321 comments T10396 listcomps T10307 T10399 bundle-export T10357 T10354 T10358 annotations T10255 T10276 T10278 T11018 T10313 T10312 T11332 parseTree T10309 T10280 boolFormula apirecomp001 numrun014 T8726 arith001 arith012 arith005 KindEqualities2 RAE_T32b Rae31 T876 T4830 T7257 lazy-bs-alloc"

Many of them are not real problems (like missing TH annotation around tests, perf failures) but some are real failures.

Investigating.

comment:24 Changed 19 months ago by Sergei Trofimovich <siarheit@…>

In c42cdb7/ghc:

fix Float/Double unreg cross-compilation

Looking at more failures on m68k (Trac #11395)
I've noticed the arith001 and arith012 test failures.
(--host=x86_64-linux --target=m68k-linux).

The following example was enough to reproduce a problem:

    v :: Float
    v = 43
    main = print v

m68k binaries printed '0.0' instead of '43.0'.

The bug here is how we encode Floats and Double
as Words with the same binary representation.

Floats:
  Before the patch we just coerced Float to Int.
  That breaks when we cross-compile from
  64-bit LE to 32-bit BE.

  The patch fixes conversion by accounting for padding.
  when we extend 32-bit value to 64-bit value (LE and BE
  do it slightly differently).

Doubles:
  Before the patch Doubles were coerced to a pair of Ints
  (not correct as x86_64 can hold Double in one Int) and
  then trucated this pair of Ints to pair of Word32.

  The patch fixes conversion by always decomposing in
  Word32 and accounting for host endianness (newly
  introduced hostBE)  and target endianness (wORDS_BIGENDIAN).

I've tested this patch on Double and Float conversion on
    --host=x86_64-linux --target=m68k-linux
crosscompiler. It fixes 10 tests related to printing Floats
and Doubles.

Thanks to Bertram Felgenhauer who poined out this probem.

Signed-off-by: Sergei Trofimovich <siarheit@google.com>

Test Plan: checked some examples manually, fixed 10 tests in test suite

Reviewers: int-e, austin, bgamari

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D1990

GHC Trac Issues: #11395

comment:25 Changed 19 months ago by Sergei Trofimovich <siarheit@…>

In e46742f5/ghc:

rts: fix threadStackUnderflow type in cmm

stg_stack_underflow_frame had an incorrect
call of C function 'threadStackUnderflow':
    ("ptr" ret_off) =
      foreign "C" threadStackUnderflow(
        MyCapability(),
        CurrentTSO);

Which means it's prototype is:
    void * (*) (W_, void*);
While real prototype is:
    W_ (*) (Capability *cap, StgTSO *tso);

The fix is simple. Fix type annotations:
    (ret_off) =
      foreign "C" threadStackUnderflow(
        MyCapability() "ptr",
        CurrentTSO "ptr");

Noticed when debugged T9045 test failure
on m68k target which distincts between
pointer and non pointer return types
(uses different registers)

While at it noticed and fixed return types
for 'throwTo' and 'findSpark'.

Trac #11395

Signed-off-by: Sergei Trofimovich <siarheit@google.com>

comment:26 Changed 19 months ago by slyfox

With changeset:e46742f5c51938bc7c992ac37fecc6df8cab7647 I can get ghci working \o/

$ inplace/bin/ghc-stage2 --interactive
GHCi, version 8.1.20160305: http://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /home/slyfox/.ghci

comment:27 in reply to:  26 Changed 19 months ago by glaubitz

Replying to slyfox:

With changeset:e46742f5c51938bc7c992ac37fecc6df8cab7647 I can get ghci working \o/

Whoa, congrats. That's very impressive. Can't wait to start bootstrapping ghc on m68k in Debian :).

comment:28 Changed 19 months ago by bgamari

Milestone: 8.0.1
Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.