Opened 9 months ago

Closed 5 months ago

#8089 closed bug (fixed)

Implementation of GHC.Event.Poll.poll is broken due to bad coercion

Reported by: merijn Owned by: thoughtpolice
Priority: normal Milestone:
Component: libraries/base Version: 7.6.3
Keywords: patch Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime crash Difficulty: Easy (less than 1 hour)
Test Case: Blocked By:
Blocking: Related Tickets:

Description

Due to an unsafe coercion in GHC.Event.Poll.poll the implementation has a bug that on some platforms (.e.g. linux) results in subtly wrong semantics and on other platforms (e.g. OS X) downright crashes the runtime.

The offending line is line 95 in GHC/Event/Poll.hsc:

c_poll ptr (fromIntegral len) (fromIntegral (fromTimeout tout))

fromTimeout returns a timeout in seconds as a value of type Int. c_poll takes a timeout in seconds as a value of type CInt. If the max value of Int is larger than that of CInt, this coercion results in an overflow, producing a negative value. The faulty semantics are in turn caused by passing this negative value as timeout to c_poll.

This overflow happens on many 64bit platforms where GHC's Int is 64bit, but CInt is often 32bit. The result of passing a negative value to poll() depends on the platform, but is (on all platforms I checked) different from the desired semantics!

On linux any negative timeouts are treated as infinite, this means that a poll() with a timeout that should be finite results in an infinitely blocking poll on linux!

On OSX any negative timeout other than -1 (which is treated as infinity) is considered an invalid parameter, returning an error and crashing the runtime.

A minimal example reproducing the error (on systems where fromIntegral (maxBound :: Int) > (fromIntegral (maxBound :: CInt) :: Integer)):

import Control.Concurrent

main = threadDelay maxBound

Using 64bit GHC on OSX this produces an immediate runtime crash.

The attached patch solves the issue by comparing the Int timeout against the max value of CInt and, if necessary, looping multiple c_poll calls (with a timeout equal to maxBound :: CInt) if they return 0 (which indicates the timeout has expired with no events occurring).

See also: https://github.com/merijn/packages-base/commit/6d8ea02a3f6a2cdb82a9ad786a8c4db780b92091

Attachments (1)

base.patch (2.6 KB) - added by merijn 9 months ago.

Download all attachments as: .zip

Change History (8)

comment:1 Changed 9 months ago by igloo

  • Status changed from new to patch

comment:2 Changed 9 months ago by merijn

Previously patch is not sufficiently paranoid. It just moves the underflow from architectures where Int is smaller than CInt to architectures where CInt is smaller than Int. Because the fromIntegral (maxBound :: CInt) :: Int conversion underflows in the computation of maxPollTimeout.

New patch deals with this by taking the max of maxBound :: Int and fromIntegral (maxBound :: CInt) :: Int), as a result if the conversion from maxBound :: CInt to Int underflows the maxPollTimeout is turned into "maxBound :: Int", which is sufficient, as the poll function can't receive time outs bigger than that anyway.

Version 0, edited 9 months ago by merijn (next)

Changed 9 months ago by merijn

comment:3 Changed 9 months ago by merijn

Removed redundant fromIntegral from patch.

comment:4 follow-up: Changed 8 months ago by thoughtpolice

I currently can't reproduce this on amd64/Linux. I'll get my OS X machine out and try and merge it tomorrow.

comment:5 Changed 8 months ago by thoughtpolice

  • Owner set to thoughtpolice

comment:6 in reply to: ↑ 4 Changed 8 months ago by merijn

Replying to thoughtpolice:

I currently can't reproduce this on amd64/Linux. I'll get my OS X machine out and try and merge it tomorrow.

Right, the only effect on linux right now is that it will block indefinitely, rather than some finite time, which only matters if there's no wake-ups, which seems very unlikely to ever happen.

comment:7 Changed 5 months ago by thoughtpolice

  • Resolution set to fixed
  • Status changed from patch to closed

I reproduced this on my OS X machine but didn't push it, it seems. The patches to base are pushed now; thanks Merijn, sorry for the delay.

Note: See TracTickets for help on using tickets.