Opened 2 years ago

Closed 9 months ago

Last modified 2 weeks ago

#8089 closed bug (fixed)

Implementation of GHC.Event.Poll.poll is broken due to bad coercion

Reported by: merijn Owned by:
Priority: normal Milestone: 7.10.1
Component: libraries/base Version: 7.6.3
Keywords: Cc: hvr, ekmett
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime crash Test Case:
Blocked By: Blocking:
Related Tickets: Differential Revisions: Phab:D407

Description

Due to an unsafe coercion in GHC.Event.Poll.poll the implementation has a bug that on some platforms (.e.g. linux) results in subtly wrong semantics and on other platforms (e.g. OS X) downright crashes the runtime.

The offending line is line 95 in GHC/Event/Poll.hsc:

c_poll ptr (fromIntegral len) (fromIntegral (fromTimeout tout))

fromTimeout returns a timeout in seconds as a value of type Int. c_poll takes a timeout in seconds as a value of type CInt. If the max value of Int is larger than that of CInt, this coercion results in an overflow, producing a negative value. The faulty semantics are in turn caused by passing this negative value as timeout to c_poll.

This overflow happens on many 64bit platforms where GHC's Int is 64bit, but CInt is often 32bit. The result of passing a negative value to poll() depends on the platform, but is (on all platforms I checked) different from the desired semantics!

On linux any negative timeouts are treated as infinite, this means that a poll() with a timeout that should be finite results in an infinitely blocking poll on linux!

On OSX any negative timeout other than -1 (which is treated as infinity) is considered an invalid parameter, returning an error and crashing the runtime.

A minimal example reproducing the error (on systems where fromIntegral (maxBound :: Int) > (fromIntegral (maxBound :: CInt) :: Integer)):

import Control.Concurrent

main = threadDelay maxBound

Using 64bit GHC on OSX this produces an immediate runtime crash.

The attached patch solves the issue by comparing the Int timeout against the max value of CInt and, if necessary, looping multiple c_poll calls (with a timeout equal to maxBound :: CInt) if they return 0 (which indicates the timeout has expired with no events occurring).

See also: https://github.com/merijn/packages-base/commit/6d8ea02a3f6a2cdb82a9ad786a8c4db780b92091

Attachments (1)

base.patch (2.6 KB) - added by merijn 2 years ago.

Download all attachments as: .zip

Change History (14)

comment:1 Changed 2 years ago by igloo

  • Status changed from new to patch

comment:2 Changed 2 years ago by merijn

Previously patch is not sufficiently paranoid. It just moves the underflow from architectures where Int is smaller than CInt to architectures where CInt is smaller than Int. Because the fromIntegral (maxBound :: CInt) :: Int conversion underflows in the computation of maxPollTimeout.

New patch deals with this by taking the max of maxBound :: Int and fromIntegral (maxBound :: CInt) :: Int), as a result if the conversion from maxBound :: CInt to Int underflows the maxPollTimeout is turned into "maxBound :: Int", which is sufficient, as the poll function can't receive time outs bigger than that anyway.

Version 0, edited 2 years ago by merijn (next)

Changed 2 years ago by merijn

comment:3 Changed 2 years ago by merijn

Removed redundant fromIntegral from patch.

comment:4 follow-up: Changed 2 years ago by thoughtpolice

I currently can't reproduce this on amd64/Linux. I'll get my OS X machine out and try and merge it tomorrow.

comment:5 Changed 2 years ago by thoughtpolice

  • Owner set to thoughtpolice

comment:6 in reply to: ↑ 4 Changed 2 years ago by merijn

Replying to thoughtpolice:

I currently can't reproduce this on amd64/Linux. I'll get my OS X machine out and try and merge it tomorrow.

Right, the only effect on linux right now is that it will block indefinitely, rather than some finite time, which only matters if there's no wake-ups, which seems very unlikely to ever happen.

comment:7 Changed 21 months ago by thoughtpolice

  • Resolution set to fixed
  • Status changed from patch to closed

I reproduced this on my OS X machine but didn't push it, it seems. The patches to base are pushed now; thanks Merijn, sorry for the delay.

comment:8 Changed 9 months ago by merijn

  • Cc hvr ekmett added
  • Keywords patch removed
  • Owner thoughtpolice deleted
  • Resolution fixed deleted
  • Status changed from closed to new

My "fixed" fix was still wrong, proper fix incoming on Phabricator

comment:9 Changed 9 months ago by Austin Seipp <austin@…>

In 24e05f48f3a3a1130ecd5a46e3089b76ee5a2304/ghc:

*Really*, really fix RTS crash due to bad coercion.

Summary:
My previous attempt to fix the new coercion bug introduced by my fix actually
just reverted back to the *old* bug. This time it should properly handle all
three size scenarios.

Signed-off-by: Merijn Verstraaten <[email protected]>

Test Plan: validate

Reviewers: dfeuer, austin, hvr

Reviewed By: austin, hvr

Subscribers: thomie, carter, simonmar

Differential Revision: https://phabricator.haskell.org/D407

GHC Trac Issues: #8089

comment:10 Changed 9 months ago by thoughtpolice

  • Differential Revisions set to Phab:D407
  • Milestone set to 7.10.1
  • Status changed from new to merge

OK, closing *again*, for good this time hopefully.

comment:11 Changed 9 months ago by thoughtpolice

  • Resolution set to fixed
  • Status changed from merge to closed

comment:12 Changed 3 weeks ago by thomie

For posterity, other commits in this series:

  • 33ed16bd8b3d95dd18e401a3d64435d8675b5f86
    Author: Merijn Verstraaten <>
    Date:   Wed Jul 24 19:00:42 2013 +0100
    
        *Really* RTS crash due to bad coercion.
        
        Previous commit only moved the coercion mistake to a different
        architecture (i.e. underflow could still occur on platforms where Int
        is smaller than CInt). This patch should definitively deal with all
        possible combinations.
        
        Signed-off-by: Austin Seipp <>
    
  • 00e04e81fb127d716719a85d9387a98b664b7176
    Author: Merijn Verstraaten <>
    Date:   Wed Jul 24 14:37:25 2013 +0100
    
        Fix OSX RTS crash due to bad coercion.
        
        The code coerces Int to CInt, which causes an overflow if Int is bigger
        than CInt (for example, Int 64bit, CInt 32 bit). This results in a
        negative value being passed to c_poll.
        
        On Linux all negative values are treated as infinite timeouts, which
        gives subtly wrong semantics, but is unlikely to produce actual bugs.
        
        OSX insists that only -1 is a valid value for infinite timeout, any
        other negative timeout is treated as an invalid argument.
        
        This patch replaces the c_poll call with a loop that handles the
        overflow gracefully by chaining multiple calls to poll to obtain the
        proper semantics.
        
        Signed-off-by: Austin Seipp <>
    

comment:13 Changed 2 weeks ago by Thomas Miedema <thomasmiedema@…>

In d0cf8f1/ghc:

Testsuite: simplify T8089 (#8089)

The previous implementation wasn't working for the `ghci` test way,
causing a fulltest failure.

Differential Revision: https://phabricator.haskell.org/D1075
Note: See TracTickets for help on using tickets.