Opened 9 years ago

Closed 2 years ago

#2528 closed bug (fixed)

documentation for nub and nubBy should be corrected, extended or removed.

Reported by: jdressel Owned by: ekmett
Priority: normal Milestone: 7.10.1
Component: Core Libraries Version: 6.8.2
Keywords: Cc: ekmett@…, hvr, ekmett, core-libraries-committee@…
Operating System: Linux Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

While working on a fun math/programming exercise I discovered a peculiar behavior in "nub". While it worked the vast majority of the time, there were a few cases where it failed to purge all duplicates as defined by (==). Replacing the use of "nub" with "nubBy (==)" correctly purged all cases.

I have linked a codepad evaluation that outputs the strange behavior: http://codepad.org/VNauTAam

I would have isolated the bug better for you, but like I say nub works by far the majority of the time. This must be a peculiar clash with my custom data structures and strange definition of (==).

Cheers.

Change History (18)

comment:1 Changed 9 years ago by JeremyShaw

Resolution: invalid
Status: newclosed

Your (==) instance is broken. Specifically, for:

-- ((9+9)+2) a = Branch (Branch (Leaf (Just 9)) Add (Leaf (Just 9))) Add (Leaf (Just 2))

-- (9+(2+9)) b = Branch (Leaf (Just 9)) Add (Branch (Leaf (Just 2)) Add (Leaf (Just 9)))

we see that:

*Main> a == b True *Main> b == a False *Main>

There may be other bugs as well, that is just the first instance I found. nubBy (==) will break as well, you just have not seen it yet.

All instances of Eq must be transitive, reflexive, symmetric, and antisymmetric. Haskell can't automatically check this for you -- so it is left up the developer to do, either by formal proof, or by unit testing.

comment:2 Changed 9 years ago by guest

Component: Compilerlibraries/base
Resolution: invalid
Status: closedreopened

(From Adrian Hey)

Well there still seems to be a problem, considering the what the library docs say about nubBy..

"The 'nubBy' function behaves just like 'nub', except it uses a user-supplied equality predicate instead of the overloaded '==' function."

..and nub was indeed originally defined as,..

nub=nubBy compare

..as is the local nub in the example.

So right or wrong, shouldn't we see the same result in each case? Here's the source of the problem I think:

elem_by :: (a -> a -> Bool) -> a -> [a] -> Bool
elem_by _  _ []         =  False
elem_by eq y (x:xs)     =  x `eq` y || elem_by eq y xs

vs.

elem _ []	= False
elem x (y:ys)	= x==y || elem x ys

Also, where does the H98 report say all instances of Eq must be transitive, reflexive, symmetric, and antisymmetric? It just says "The Eq class provides equality (==)..", whatever that might mean :-)

comment:3 Changed 9 years ago by guest

(from Adrian Hey)

Erm..that should be..

nub=nubBy (==)

comment:4 Changed 9 years ago by jdressel

Jeremy is correct: I forgot a case in my definition of (==) that makes it not completely symmetric for second layer commutative equivalence (oops).

The relevant demonstration for the example case that failed in the linked codepaste is:

  *Main> let a = Branch (Branch (Leaf (Just 9)) Add (Leaf (Just 9))) Add (Leaf (Just 2))
  *Main> let b = Branch (Leaf (Just 9)) Add (Branch (Leaf (Just 9)) Add  (Leaf (Just 2)))
  *Main> a == b
  True
  *Main> b == a
  False
  *Main> List.nubBy (==) [a,b]
  [((9+9)+2)]
  *Main> List.nub [a,b]
  [((9+9)+2),(9+(9+2))]

I guess the lack of symmetry in (==) causes the difference in behavior, which makes more sense to me now.

However, Adrian is also correct, the difference in behavior between "nub" and "nubBy (==)" is what startled me. I would have found the bug in my code eventually (especially after unit testing).

comment:5 Changed 9 years ago by simonmar

difficulty: Unknown
Owner: set to simonmar
Status: reopenednew

comment:6 in reply to:  2 Changed 9 years ago by JeremyShaw

Replying to guest:

(From Adrian Hey)

Well there still seems to be a problem, considering the what the library docs say about nubBy..

hrm, interesting. I am guessing that even with 'valid' Eq instances that could be important due to laziness and bottom?

Also, where does the H98 report say all instances of Eq must be transitive, reflexive, symmetric, and antisymmetric? It just says "The Eq class provides equality (==)..", whatever that might mean :-)

Well, it does not say it explicitly, but I suspect H98's usage of Eq implicitly demands those laws be followed. Hopefully in Haskell' the laws will not only be stated, but there will be some QuickCheck-style properties you can use to test your own instances ;)

comment:7 Changed 8 years ago by igloo

Milestone: 6.10.1

comment:8 Changed 8 years ago by simonmar

Resolution: fixed
Status: newclosed

Fixed:

Tue Sep  2 10:29:50 BST 2008  Simon Marlow <marlowsd@gmail.com>                 
  * #2528: reverse the order of args to (==) in nubBy to match nub

comment:9 Changed 8 years ago by simonmar

Architecture: UnknownUnknown/Multiple

comment:10 Changed 5 years ago by maeder

Owner: simonmar deleted
Resolution: fixed
Status: closednew
Type of failure: None/Unknown

The last change made the implementation and reported prelude version (and thus the documentation of elem_by) inconsistent, see http://www.haskell.org/pipermail/libraries/2011-September/016758.html

I propose to reverse the order or args under "USE_REPORT_PRELUDE", too, because I think, this is the right behavior and has the least impact.

The filter predicate could also be written as "(not . (`eq` x))".

"elem_by eq y xs" could be replaced by "any (eq y) xs" (and elem_by deleted), but I don't know if this has any performance impacts.

comment:12 Changed 5 years ago by maeder

Well, at least the wrong comment containing "same order" should be corrected, extended or removed.

comment:13 Changed 4 years ago by igloo

Milestone: 6.10.17.8.1

See also #7913

comment:14 Changed 3 years ago by George

Summary: nub not as reliable as nubBydocumentation for nub and nubBy should be corrected, extended or removed.

comment:15 Changed 3 years ago by ekmett

Cc: ekmett@… hvr ekmett added

Jeremy,

Reflexivity is violated in base by the instances for Float/Double, by the behavior around NaN so that law falls apart quickly.

Eq is also not structural (as demonstrated by -0 and 0 being (==) in Float/Double), though you didn't offer that one.

Ord for those is a bit like a funhouse playground. ;)

comment:16 Changed 3 years ago by thoughtpolice

Milestone: 7.8.37.10.1

Moving to 7.10.1.

comment:17 Changed 2 years ago by thoughtpolice

Component: libraries/baseCore Libraries
Owner: set to ekmett

Moving over to new owning component 'Core Libraries'.

comment:18 Changed 2 years ago by Herbert Valerio Riedel <hvr@…>

In a2e7bbfe7656cf7dbf1af4da5c077ac0b5d41127/ghc:

Preserve argument order to (==)/eq in nub and nubBy

This makes nub and nubBy behave as specified in the Haskell 98 Report.

This reverts 0ad9def53842e86fb292eccb810190711c42d7c5, and
fixes #3280, #7913 and #2528 (properly).

Before this change, the output of `T2528` was (4x wrong):
```
[A,B]
[1,2]
False
False
```

Reviewed By: dfeuer, ekmett, austin, hvr

Differential Revision: https://phabricator.haskell.org/D238

comment:19 Changed 2 years ago by thomie

Cc: core-libraries-committee@… added
Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.