Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#11170 closed bug (wontfix)

(read ".9") :: Double unable to parse

Reported by: varosi Owned by:
Priority: normal Milestone:
Component: Core Libraries Version: 7.10.2
Keywords: Read, report-impact Cc: ekmett
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Incorrect result at runtime Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

For most languages ".9" is a floating point number equal to 0.9, but in GHC this is unparsable. I don't know if it is by design, but it was unexpected by me. I think that it is good this to be valid parsable to Double string.

What do you think of?

Change History (16)

comment:1 Changed 3 years ago by nomeata

Resolution: invalid
Status: newclosed

It actually is parseable, if you put parens around it:

> :t (.4)
(.4) :: Num (a -> b) => (b -> c) -> a -> c

The problem here is that . is an operator, so it parses it as \x -> (.) x 4 (which would type check if you have have an instance Num (a -> b)). This is the correct behaviour according to the language specification, and definitely nothing a compiler should do different.

comment:2 Changed 3 years ago by varosi

Resolution: invalid
Status: closednew

comment:3 Changed 3 years ago by varosi

As I mention in the title and in the "Component" field, the problem is not in the GHC itself, but in Prelude:

(read ".9") :: Double

comment:4 Changed 3 years ago by nomeata

Cc: ekmett added
Component: PreludeCore Libraries

Ah, I see. Sorry for not reading your request carefully.

The report simply states

Reads an unsigned RealFrac value, expressed in decimal scientific notation.

which allows for certain variations in interpretation. Also, I don’t see a problem with read being more liberal than necessary, as long as no ambiguities or similar are introduced.

Reassigning the component to get it onto the radar of the Core Library Committee.

comment:5 Changed 3 years ago by ekmett

In the past we've avoided making GHC-specific changes in what read will accept to avoid hard to track down differences across compilers.

We've rejected at least one similar generalization request (binary literals) on those grounds. #10092

That said, there is already a chink in that armor as I believe we've let in a patch in that generalized lex to allow a bit more unicode. #10444

Last edited 3 years ago by ekmett (previous) (diff)

comment:6 Changed 3 years ago by ekmett

Keywords: report-impact added

comment:7 Changed 3 years ago by rwbarton

FWIW, my understanding of #10444 was that it brought GHC in line with the Report. I don't remember the details, though.

comment:8 Changed 3 years ago by thomie

Architecture: x86_64 (amd64)Unknown/Multiple
Operating System: WindowsUnknown/Multiple

comment:9 Changed 3 years ago by ekmett

It appears that #10444 does indeed bring GHC in line with the report.

comment:10 Changed 3 years ago by varosi

Is it possible for GHC 8.x (major version change) to brake this, so it could parse ".xx" floats are most languages do?

comment:11 Changed 3 years ago by ekmett

GHC doesn't specify the language report, so it renumbering to 8.x is irrelevant to this issue.

comment:12 Changed 3 years ago by varosi

Is Report is showing a possibility to read ".9" as Double?

comment:13 Changed 3 years ago by ekmett

Resolution: invalid
Status: newclosed

The lexical syntax specified in the Haskell 98 Report here: https://www.haskell.org/onlinereport/syntax-iso.html#sect9.2 gives

float	->	decimal . decimal [exponent]
        |	decimal exponent

A change to the report here would haphazardly change the semantics and behavior of code that currently typecheck.

Like it or not, in the presence of the rather common

instance Num b => Num (a -> b) where
  f + g = \x -> f x + g x
  fromIntegral n = \x -> fromIntegral n
  ...

then (.9) parses today as precomposition with the constant function that returns 9.

abs .9 is a composition of abs and the function you obtain above from 9, etc.

In the absence of such a change to the Haskell Report I'd say we should close this as a invalid (or wontfix) as it steps outside of the mandate of read for the language as it exists. From the standpoint of the libraries committee I'm going to close this out as such.

In the unlikely event that the haskell-prime committee changes the language we have such that .9 parses as a Fractional value then we'd be faced with the dilemma of how to support both old and new standards on the library front. Feel free to reopen this if that happens and we'll be forced to figure out what to do.

comment:14 Changed 3 years ago by varosi

Resolution: invalid
Status: closednew

Here the problem is not in the GHC itself, but in Prelude with parsing of strings using Read type class:

(read ".9") :: Double

comment:15 Changed 3 years ago by carter

Resolution: wontfix
Status: newclosed

1) i understand the nature of your request, you do not need to explain it further

2) the purpose of read / show are to provide basic round tripping facilities for naive debugging / printing of haskell data structures,

3) for complicated data structures, accepting ".9" as valid would create some room for more ambiguity in parsing

4) as a further piece of anecdata for why this would be a terrible change, in a work code base, the fact that attoparsec accepts floats in this format created some incredibly hard to resolve bugs in a streaming parsing facility. So from that perspective since this seems to mostly an aesthetical feature request, i suggest you write you own either a) read instance for a newtype'd double, OR use a different parsing library that suites your goals, such as attoparsec :)

for core core data types such as floating point, breaking changes to how built in parsing facilities work must be done with utmost consideration.

I actually took the time to check the 2008 IEEE Floating point standard, and it doesn't specify this. The only format support that is mandated by the IEEE 2008 floating point standard is support for reading/writing floating point numbers in hexadecimal notation. The specification therein does allow terms of the form "0x.9" or "0X.9", but those are MUCH MUCH less ambiguous than ".9" in terms of possible parsing interpretations, and is infact a very very reasonable part of the standard because that is precisely a finite representation that has zero rounding between the internal binary representation of finite numbers and a hexadecimal floating point string presentation.

Several folks have now all agreed that for a miscellany of reasons that your feature request is invalid. It would be valid to explore adding hexadecimal floating point read/show support, though that could itself have parsing implications that'd need be understood, but doesn't face the same parsing ambiguities, AND is actually specified by the underlying floating point standard. (just because the C11 language standard specifies that ".1" is a valid floating point literal doesn't justify that we do, and the read instances for haskell base SHOULD / MUST match what are acceptable literals for the source, and because the code for read/show does't change with the language extensions enabled, it MUST match what is specified in the haskell language standard)

point of record: haskell language defines float lexical terms as

float → decimal . decimal [exponent]
    |  decimal exponent

likewise, the hexformat in the 2008 IEEE floating point standard is

sign
digit
hexDigit hexExpIndicator hexIndicator hexSignificand decExponent hexSequence
[+ −]
[0123456789]
[0123456789abcdefABCDEF]
[Pp]
"0" [Xx]
( {hexDigit} * "." {hexDigit}+ | {hexDigit}+ "." | {hexDigit}+ ) {hexExpIndicator} {sign}? {digit}+
{sign}? {hexIndicator} {hexSignificand} {decExponent}


where each line is a name followed by a rule in which ‘[...]’ selects one of the terminal characters listed between the brackets, ‘{...}’ refers to an earlier named rule, ‘(... | ... | ...)’ indicates a choice of one of three alternatives, straight double quotes enclose a terminal character, ‘?’ indicates that there shall be either no instance or one instance of the preceding item, ‘*’ indicates that there shall be zero or more instances of the preceding item, and ‘+’ indicates that there shall be one or more instances of the preceding item.

comment:16 Changed 3 years ago by varosi

I have already written my own Attoparsec parser for that. For me at first sight it was a bug, because I used to use ".[decimal]" for floats on other languages.

Okay, thank you for the thorough research about it! For all those reasons it seems that current implementation is the right way to be done in Haskell.

Note: See TracTickets for help on using tickets.