Opened 13 months ago

Closed 4 months ago

#7794 closed bug (fixed)

GHCi "Prelude.undefined" exceptions on ARM; ByteCodeItbls.mkJumpToAddr unimplemented

Reported by: cjwatson Owned by:
Priority: normal Milestone: 7.8.1
Component: GHCi Version: 7.6.2
Keywords: Cc: mail@…, bgamari@…, hvr
Operating System: Linux Architecture: arm
Type of failure: GHCi crash Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description (last modified by igloo)

A number of non-trivial operations fail in ghci on ARM (specifically armhf on both Debian experimental and Ubuntu raring). I initially noticed this as conduit's doctests failing, but reduced this to:

  $ ghci
  GHCi, version 7.6.2: http://www.haskell.org/ghc/  :? for help
  Loading package ghc-prim ... linking ... done.
  Loading package integer-gmp ... linking ... done.
  Loading package base ... linking ... done.
  Prelude> data Type a = Nothing
  *** Exception: Prelude.undefined

This is a singularly unhelpful exception, coming as it does without context, but I eventually tracked it down to ByteCodeItbls?.mkJumpToAddr having no specific implementation for ARM and thus falling back to the default of "undefined". I've tried adding this:

+#elif arm_TARGET_ARCH
+type ItblCode = Word32
+mkJumpToAddr a
+    = [ 0xe51ff004      -- ldr pc, [pc, #-4]    # pc reads as <current insn>+8
+      , fromIntegral (ptrToInt a) ]

This definitely changes the nature of the problem, so I think I'm on the right track, and I'm reasonably confident that that's the correct branch implementation; but when I actually try it in practice I get crashes (SIGSEGV/SIGILL/SIGBUS) with trashed stack traces, so I'm clearly not done yet. Friends of mine suggest that I may need to __clear_cache or equivalent, which sounds plausible, so my current line of attack is figuring out how to glue that in; but if any actual GHC hackers can spot something obviously missing in the meantime then that would be great.

Change History (11)

comment:1 Changed 13 months ago by nomeata

  • Cc mail@… added

comment:2 Changed 13 months ago by igloo

  • Description modified (diff)
  • Difficulty set to Unknown

comment:3 Changed 12 months ago by igloo

  • Milestone set to 7.8.1

Thanks for the report. I've changed the code so you now get a more useful panic message, but obviously we still need an actual fix for arm.

comment:4 follow-ups: Changed 8 months ago by RoboTux

Inlining the call of the cacheflush system call (the call behind __clear_cache) should work but it means the cache would be flush every time mkJumpToAddr is called. I don't know enough the context of the code generation but if that function is called several time after 1 code generation then it's suboptimal. The patch then become:

+#elif arm_TARGET_ARCH
+type ItblCode = Word32
+mkJumpToAddr a
+    = [ 0xe92d0080      -- push    {r7}
+      , 0xe3a0780f      -- mov     r7, #983040     ; 0xf0000
+      , 0xe2877002      -- add     r7, r7, #2
+      , 0xe3a02000      -- mov     r2, #0
+      , 0xef000000      -- swi     0x0
+      , 0xe8bd0080      -- pop     {r7}
+      , 0xe51ff004      -- ldr pc, [pc, #-4]    # pc reads as <current insn>+8
+      , fromIntegral (ptrToInt a) ]
Last edited 8 months ago by RoboTux (previous) (diff)

comment:5 in reply to: ↑ 4 Changed 8 months ago by RoboTux

Replying to RoboTux:

Inlining the call of the cacheflush system call (the call behind __clear_cache) should work but it means the cache would be flush every time mkJumpToAddr is called. I don't know enough the context of the code generation but if that function is called several time after 1 code generation then it's suboptimal. The patch then become:

+#elif arm_TARGET_ARCH
+type ItblCode = Word32
+mkJumpToAddr a
+    = [ 0xe92d0080      -- push    {r7}
+      , 0xe3a0780f      -- mov     r7, #983040     ; 0xf0000
+      , 0xe2877002      -- add     r7, r7, #2
+      , 0xe3a02000      -- mov     r2, #0
+      , 0xef000000      -- swi     0x0
+      , 0xe8bd0080      -- pop     {r7}
+      , 0xe51ff004      -- ldr pc, [pc, #-4]    # pc reads as <current insn>+8
+      , fromIntegral (ptrToInt a) ]

I tried the patch on a Debian arm porterbox and it seems to work as no exception was raised with it while an exception was raised without it.

comment:6 in reply to: ↑ 4 ; follow-up: Changed 7 months ago by cjwatson

Replying to RoboTux:

Inlining the call of the cacheflush system call (the call behind __clear_cache) should work but it means the cache would be flush every time mkJumpToAddr is called. I don't know enough the context of the code generation but if that function is called several time after 1 code generation then it's suboptimal.

It will be called every time one of the generated chunks of code is called, so I would expect it to be suboptimal, yes. That in itself might be tolerable.

Much more importantly, though, this patch misses the point of flushing the instruction cache. The point is that we need to flush the cache in order for the processor to reliably read back the code that we just
wrote out; flushing the cache in that very code is not a viable approach to that, because the cache-flushing code itself might not be read. So I'm afraid I don't think this is going to work properly. It might work some of the time, of course, depending on cache locality, but I can't see how it would work reliably.

comment:7 in reply to: ↑ 6 Changed 7 months ago by RoboTux

Replying to cjwatson:

It will be called every time one of the generated chunks of code is called, so I would expect it to be suboptimal, yes. That in itself might be tolerable.

Much more importantly, though, this patch misses the point of flushing the instruction cache. The point is that we need to flush the cache in order for the processor to reliably read back the code that we just
wrote out; flushing the cache in that very code is not a viable approach to that, because the cache-flushing code itself might not be read. So I'm afraid I don't think this is going to work properly. It might work some of the time, of course, depending on cache locality, but I can't see how it would work reliably.

Oh, I didn't realize this was the generated code. I don't know if
it's because of the language but I can't make much sense of what the code is
doing. I assumed this code was some jit compiler and that the jump was
executed when a function has just been compiled in order to actually call that
function. If it's part of the generated function indeed it's very suboptimal
and won't work as you very well explained.

My apologize but I don't think I can bring this further without investing
quite some time to understand the code.

Thanks for being so quick in looking at the patch.

Best regards.

comment:8 Changed 7 months ago by bgamari

  • Cc bgamari@… added

comment:9 Changed 6 months ago by nomeata

  • Cc hvr added

There is a related ticket/patch at #8380. Colin, what do you say to 8380#comment:4?

comment:10 Changed 6 months ago by cjwatson

Things have moved on quite a bit since I last checked, but that ticket does look essentially the same as this one.

When I last looked, though, GHCi wasn't actually using the relevant part of libffi in the newExecConItbl path, and I still can't find anything that would do this. Am I missing something here or is bgamari?

comment:11 Changed 4 months ago by thoughtpolice

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.