Opened 3 years ago

Closed 3 years ago

#5396 closed bug (fixed)

rare segfault in a terminal game

Reported by: MikolajKonarski Owned by: simonmar
Priority: high Milestone: 7.4.1
Component: libraries/base Version: 7.3
Keywords: Cc:
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Documentation bug Difficulty:
Test Case: Blocked By:
Blocking: Related Tickets:

Description

The segfault happens with this package: http://hackage.haskell.org/package/Allure, when compiled with flag -fvty. It looks indeterministic, Happened with and without compiling with --ghc-options="-dcore-lint -debug", with and without running under gdb (shows a backtracs of length 3 each time). Happened in a few different versions of the application, with changed code. Quick keystrokes seem to help reproducing it, and/or autorepeat of keys. No luck with finding a reliable way to reproduce. Tested only on one machine; triggered over 10 times for around 5 hours of play total. I'm no longer sure it never happens with other ghc versions (and other frontends), because of how irregular it's appearance is.

This version of the application:

https://github.com/Mikolaj/Allure/commit/a769ee7f30a38574f6402fe677f537077e0b7d69

has deterministic pseudo-random numbers generation and I managed to trigger the bug twice on a map with manually set random seed 6 (in te config file). Once it happened after a few seconds of play, once after a few minutes, but still on level 1. No luck in many other attempts.

The version also has a stdin/stdout frontend and a deterministic bot. By default, the bot is run on the map generated with random seed 6, but it didn't manage to reproduce the bug so far, with many bot seeds. Details of using the bot in the commit log.

I will try to diagnose further, but probably not until the last week of August.

Change History (7)

comment:1 Changed 3 years ago by MikolajKonarski

Update: when doing gdb debug correctly, I get:

Program received signal SIGSEGV, Segmentation fault.

0x000000000059e5b7 in mk_wcswidth ()

(gdb) where

#0 0x000000000059e5b7 in mk_wcswidth ()

#1 0x0000000000570fbd in scmf_info ()

#2 0x0000000000000000 in ?? ()

and the mk_wcswidth is probably a binding to custom C code defined here:

https://github.com/coreyoconnor/vty/blob/master/cbits/mk_wcwidth.c#L207

It seems other programs trigger it too:

http://chatlogs.jabber.ru/haskell@conference.jabber.ru/2011/04/13.html

(but it's not clear to me what problem it is).

May be related to:

https://github.com/coreyoconnor/vty/issues/14

Notified the maintainer.

comment:2 Changed 3 years ago by MikolajKonarski

Another relevant line

https://github.com/coreyoconnor/vty/blob/master/src/Codec/Binary/UTF8/Width.hs#L31

BTW, I'm perplexed, because the comment in

http://hackage.haskell.org/packages/archive/base/latest/doc/html/src/Foreign-C-String.html#withCWStringLen

says the string is "NUL terminated", but it's not terminated with anything extra --- there are just the converted characters put into an array.

If I'm right, things can segfault a lot, e.g., with empty strings that don't get any C array allocated. A workaround for this in vty would be to change the order of arguments of conjunction in

https://github.com/coreyoconnor/vty/blob/master/cbits/mk_wcwidth.c#L211

comment:3 Changed 3 years ago by MikolajKonarski

  • Component changed from Compiler to libraries/base
  • Type of failure changed from Runtime crash to Documentation bug

OK, so I'm now as certain, as a newbie debugging an indeterministic segfault can be, about where the bug is. The comment for

http://hackage.haskell.org/packages/archive/base/4.4.0.0/doc/html/src/Foreign-C-String.html#withCWStringLen

is wrong, it should start with

-- | Marshal a Haskell string into a C wide string

-- in temporary storage, with explicit length information.

The rest of the comment is OK and the code agrees with the other comments (for the CWStringLen type, etc.). This means the vty code, which seems to be based on the wrong comment, is wrong. I've already submitted a pull request for vty.

If the segfault ever returns, I will let you know.

comment:4 follow-up: Changed 3 years ago by igloo

  • Milestone set to 7.4.1
  • Owner set to igloo
  • Priority changed from normal to high

Presumably "the Haskell string may /not/ contain any NUL characters" is also wrong.

comment:5 in reply to: ↑ 4 Changed 3 years ago by MikolajKonarski

Replying to igloo:

Presumably "the Haskell string may /not/ contain any NUL characters" is also wrong.

Sorry, I haven't specified the length of the wrong prefix. Yes, only the last point is correct (IMHO --- comparing with similar functions above).

comment:6 Changed 3 years ago by simonmar

  • Owner changed from igloo to simonmar

Good catch, I'll fix the docs.

comment:7 Changed 3 years ago by simonmar

  • Resolution set to fixed
  • Status changed from new to closed
commit a57369f54bd25a1de5d477f3c363b3bafd17d168
Author: Simon Marlow <marlowsd@gmail.com>
Date:   Thu Aug 25 10:41:43 2011 +0100

    Fix documentation for withCWStringLen (#5396)
Note: See TracTickets for help on using tickets.