Opened 7 years ago

Last modified 5 months ago

#4505 new bug

Segmentation fault on long input (list of pairs)

Reported by: cathper Owned by:
Priority: high Milestone:
Component: Compiler Version: 7.0.1
Keywords: Segmentation fault, segfault, long input Cc: cathper@…, snoyberg, merijn
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Runtime crash Test Case:
Blocked By: #4258 Blocking:
Related Tickets: Differential Rev(s): Phab:D1180
Wiki Page:

Description

When compiling LongList.hs with

ghc --make LongList.hs

then running the output gives a segfault:

$ ./LongList
8595
Segmentation fault

It also segfaults with ghc 6.12.1 (without printing 8595) and with ghc 6.12.3.

On ghc 6.12.3 running on Mac OS 10.5, no segfault is seen. (I haven't tried much longer input, since it already eats up approx. 1.2 GB of RAM.)

Attachments (1)

LongList.hs (100.7 KB) - added by cathper 7 years ago.
A file with a very long line.

Download all attachments as: .zip

Change History (49)

Changed 7 years ago by cathper

Attachment: LongList.hs added

A file with a very long line.

comment:1 Changed 7 years ago by cathper

Cc: cathper@… added

comment:2 Changed 6 years ago by simonmar

Owner: set to simonmar

comment:3 Changed 6 years ago by simonmar

Milestone: 7.0.2
Priority: normalhighest

comment:4 Changed 6 years ago by simonmar

Did you manage to compile this file? I left it compiling overnight last night, and it still hasn't finished.

comment:5 Changed 6 years ago by cathper

Yes. The compilation time varies quite much, though.

> ghc --version; wget -q http://hackage.haskell.org/trac/ghc/raw-attachment/ticket/4505/LongList.hs && time nice ghc --make LongList.hs
The Glorious Glasgow Haskell Compilation System, version 6.12.1
[1 of 1] Compiling Main             ( LongList.hs, LongList.o )
Linking LongList ...
nice ghc --make LongList.hs  63,91s user 1,11s system 99% cpu 1:05,28 total
> rm LongList*
> /tmp/ghc7/bin/ghc --version; wget -q http://hackage.haskell.org/trac/ghc/raw-attachment/ticket/4505/LongList.hs && time nice /tmp/ghc7/bin/ghc --make LongList.hs
The Glorious Glasgow Haskell Compilation System, version 7.0.1
[1 of 1] Compiling Main             ( LongList.hs, LongList.o )
nice /tmp/ghc7/bin/ghc --make LongList.hs  515,28s user 2,06s system 99% cpu 8:40,01 total
> uname -a
Linux freja 2.6.32-25-server #44-Ubuntu SMP Fri Sep 17 21:13:39 UTC 2010 x86_64 GNU/Linux
> cat /proc/cpuinfo|grep "model name"|head -n 1
model name      : Intel(R) Xeon(R) CPU           E5420  @ 2.50GHz

It uses > 1 GB of RAM.

comment:6 Changed 6 years ago by simonmar

I understand what the problem is here. The program compiles to code that allocates >1MB in one go, and the runtime doesn't currently support that (we support allocating >1MB only for a single object, such as an array, not for multiple objects). This is a zero-day bug: it's been around for ever, but nobody has run into it until now.

Worst case we can make the RTS emit an error message rather than just segfaulting, but I hope we'll be able to find a way to fix it properly. I'll look into it.

Meanwhile, it seems the HEAD can't compile the test program at all, unless a type signature is added, and 7.0.1 is significantly slower than 6.12.3 to compile it. I've created #4801 for the compiler performance regression.

comment:7 Changed 6 years ago by simonmar

Priority: highesthigh

Not a blocker, because it's not a regression.

comment:8 Changed 6 years ago by simonmar

Priority: highhighest
Status: newmerge

I improved the error message. Please merge these for 7.0.2, and then re-milestone to 7.2.1 for a proper fix.

Wed Dec  8 08:32:12 PST 2010  Simon Marlow <marlowsd@gmail.com>
  * Tweak the "sorry" message a bit
  
  -		"sorry! (this is work in progress)\n"
  +		"sorry! (unimplemented feature or known bug)\n"

Thu Dec  9 03:40:05 PST 2010  Simon Marlow <marlowsd@gmail.com>
  * Catch too-large allocations and emit an error message (#4505)
  
  This is a temporary measure until we fix the bug properly (which is
  somewhat tricky, and we think might be easier in the new code
  generator).
  
  For now we get:
  
  ghc-stage2: sorry! (unimplemented feature or known bug)
    (GHC version 7.1 for i386-unknown-linux):
          Trying to allocate more than 1040384 bytes.
  
  See: http://hackage.haskell.org/trac/ghc/ticket/4550
  Suggestion: read data from a file instead of having large static data
  structures in the code.

Thu Dec  9 12:04:04 GMT 2010  Simon Marlow <marlowsd@gmail.com>
  * fix ticket number (#4505)

comment:9 Changed 6 years ago by simonmar

(the priority bump was an accident, in case you were wondering)

comment:10 in reply to:  8 ; Changed 6 years ago by cathper

Replying to simonmar:

I improved the error message. Please merge these for 7.0.2, and then re-milestone to 7.2.1 for a proper fix.

I'm a Trac novice. You mean, merge it as fixed (so the ticket is closed), and open a new ticket that has 7.2.1 as milestone?

See: http://hackage.haskell.org/trac/ghc/ticket/4550

s/4550/4505/ I guess.

comment:11 in reply to:  10 Changed 6 years ago by simonmar

Replying to cathper:

I'm a Trac novice. You mean, merge it as fixed (so the ticket is closed), and open a new ticket that has 7.2.1 as milestone?

Sorry for the confusion, that message was intended for Ian, who does all the merging.

See: http://hackage.haskell.org/trac/ghc/ticket/4550

s/4550/4505/ I guess.

Yes, I fixed that in a subsequent patch.

comment:12 Changed 6 years ago by cathper

Just perfect :-)

comment:13 Changed 6 years ago by simonmar

One further patch to merge:

Fri Dec 10 09:40:02 GMT 2010  Simon Marlow <marlowsd@gmail.com>
  * warning fix: don't redefine BLOCKS_PER_MBLOCK

comment:14 Changed 6 years ago by igloo

First 3 merged.

comment:15 Changed 6 years ago by igloo

Milestone: 7.0.27.2.1
Owner: simonmar deleted
Status: mergenew

Last one also merged.

comment:16 Changed 6 years ago by igloo

Owner: set to simonmar

comment:17 Changed 6 years ago by simonmar

Priority: highestnormal

comment:18 Changed 6 years ago by simonmar

Owner: simonmar deleted

no proper fix in sight until we have the new code generator.

comment:19 Changed 6 years ago by igloo

Blocked By: 4258 added

comment:20 Changed 5 years ago by igloo

Milestone: 7.4.17.6.1
Priority: normallow

comment:21 Changed 5 years ago by igloo

Milestone: 7.6.17.6.2

comment:22 Changed 3 years ago by thoughtpolice

Milestone: 7.6.27.10.1

Moving to 7.10.1.

comment:23 Changed 3 years ago by Yuras

difficulty: Unknown
Priority: lownormal

The compile time check (introduced in a278f3f02d09bc32b0a75d4a04d710090cde250f) was removed when replacing the old code generator. So now it crashes at runtime again. It was report a number of times, see #9647 and #8568.

comment:24 Changed 3 years ago by thomie

Nice find Yuras. Should be easy to put that check back in then.

comment:25 Changed 2 years ago by simonpj

Owner: set to simonmar
Priority: normalhigh

Simon M: might you re-instate the previous fix, helpfully identified by Yuras, pending a more glorious resolution. (With a pointer from the fix back to this ticket.)

I'll make the priority high, because it seems easy and much better than a runtime crash. Simon

comment:26 Changed 2 years ago by snoyberg

Cc: snoyberg added

I just ran into this on our code base. Is there any chance of getting a fix for this into the 7.8 series?

comment:27 Changed 2 years ago by simonpj

Priority: highhighest

I wish this had come up 3 weeks ago when we were triaging tickets for 7.8. The release candidate for 7.8.4 is out, so it's tough to get anything else in. Unless you say "this makes 7.8 unusable for my purposes". (This is the case for 7.8.3, which is why we went ahead with 7.8.4.)

Putting it in would mean a delay of a couple of weeks while (a) Simon re-instates the previous fix, (b) Austin makes a new release candidate (c) everyone tests it.

Of course, it's better to delay 7.8.4 than to have to put out 7.8.5. But if we keep doing that, we never make a release.

My instinct is "too late". But is there strong user sentiment about this one?

Simon

comment:28 Changed 2 years ago by simonmar

The code that implements the runtime check is still in place, it's only the compile-time check that we lost. So you don't get a segfault, but you do get a runtime failure:

            barf("allocation of %ld bytes too large (GHC should have complained at compile-time)", (long)cap->r.rHpAlloc);

@snoyberg: do you see this?

Given this, I don't think it's worth holding up 7.8.4. We wouldn't be fixing the bug, only moving the failure to compile-time.

comment:29 Changed 2 years ago by snoyberg

@simonmar: Yes, that's the error message we're seeing at runtime.

Moving this bug from runtime to compile time would be a *huge* advantage to me. But I agree that it's not worth holding up the release. To clarify: I would still be very happy to see a fix for this on the 7.8 branch, as we could build our own GHC binary with the patch.

comment:30 Changed 2 years ago by rafael-felix

Would it help to create a "should fail" test case inside the array directory?

comment:31 Changed 2 years ago by simonmar

Priority: highesthigh

Not release-blocking

comment:32 Changed 2 years ago by thoughtpolice

Milestone: 7.10.17.12.1

Moving to 7.12.1

comment:33 Changed 2 years ago by merijn

Cc: merijn added

comment:34 Changed 21 months ago by bgamari

Simonmar, what do we want to do about this? IMHO we should at least reinstate the compile-time warning before 7.12.

comment:35 Changed 21 months ago by bgamari

Differential Rev(s): D1180

I've opened Phab:D1180 reintroducing the compile-time check.

It would however, be nice to know what would need to happen to fix the RTS.

comment:36 Changed 21 months ago by bgamari

Owner: changed from simonmar to bgamari

comment:37 Changed 21 months ago by thoughtpolice

Differential Rev(s): D1180Phab:D1180

comment:38 Changed 21 months ago by Ben Gamari <ben@…>

In c1d7b4b4/ghc:

StgCmmHeap: Re-add check for large static allocations

This should at least help alleviate the annoyance of #4505. This
reintroduces a compile-time check originally added in
a278f3f02d09bc32b0a75d4a04d710090cde250f but dropped with the new code
generator.

comment:39 Changed 21 months ago by bgamari

Re-added the compile-time check but it would be nice to resolve the underlying RTS issue.

comment:40 Changed 21 months ago by thoughtpolice

Milestone: 7.12.18.0.1

Milestone renamed

comment:41 Changed 20 months ago by bgamari

Owner: bgamari deleted

I fixed the compile-time check; leaving the proper RTS fix for someone else.

comment:42 Changed 16 months ago by niteria

Owner: set to niteria

I'm going to try my hand at this in my spare cycles. If it's blocking anyone feel free to grab it.

comment:43 Changed 16 months ago by bgamari

Milestone: 8.0.18.2.1

Punting to 8.2.

comment:44 Changed 6 months ago by bgamari

Milestone: 8.2.18.0.2
Owner: niteria deleted

De-milestoning since the solution isn't clear and no one has stepped up to handle this.

If you are affected by this issue please do pick it up!

comment:45 Changed 6 months ago by bgamari

Status: newpatch

Here is a quick attempt at a patch: https://github.com/bgamari/array/commit/73144bda5cf4f5ba2e73206ea0a53200e30218af. The only trouble here is that we now are doing overflow in unsafe indexing operations, which will cost a bit of performance.

comment:46 Changed 6 months ago by bgamari

Please ignore comment:45 which was intended for #229.

comment:47 Changed 6 months ago by bgamari

Status: patchnew

comment:48 Changed 5 months ago by bgamari

Milestone: 8.0.2
Note: See TracTickets for help on using tickets.