Opened 6 years ago

Last modified 3 months ago

#2528 new bug

documentation for nub and nubBy should be corrected, extended or removed.

Reported by: jdressel Owned by:
Priority: normal Milestone: 7.10.1
Component: libraries/base Version: 6.8.2
Keywords: Cc: ekmett@…, hvr, ekmett
Operating System: Linux Architecture: Unknown/Multiple
Type of failure: None/Unknown Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

While working on a fun math/programming exercise I discovered a peculiar behavior in "nub". While it worked the vast majority of the time, there were a few cases where it failed to purge all duplicates as defined by (==). Replacing the use of "nub" with "nubBy (==)" correctly purged all cases.

I have linked a codepad evaluation that outputs the strange behavior:
http://codepad.org/VNauTAam

I would have isolated the bug better for you, but like I say nub works by far the majority of the time. This must be a peculiar clash with my custom data structures and strange definition of (==).

Cheers.

Change History (15)

comment:1 Changed 6 years ago by JeremyShaw

  • Resolution set to invalid
  • Status changed from new to closed

Your (==) instance is broken. Specifically, for:

-- ((9+9)+2)
a = Branch (Branch (Leaf (Just 9)) Add (Leaf (Just 9))) Add (Leaf (Just 2))

-- (9+(2+9))
b = Branch (Leaf (Just 9)) Add (Branch (Leaf (Just 2)) Add (Leaf (Just 9)))

we see that:

*Main> a == b
True
*Main> b == a
False
*Main>

There may be other bugs as well, that is just the first instance I found. nubBy (==) will break as well, you just have not seen it yet.

All instances of Eq must be transitive, reflexive, symmetric, and antisymmetric. Haskell can't automatically check this for you -- so it is left up the developer to do, either by formal proof, or by unit testing.

comment:2 follow-up: Changed 6 years ago by guest

  • Component changed from Compiler to libraries/base
  • Resolution invalid deleted
  • Status changed from closed to reopened

(From Adrian Hey)

Well there still seems to be a problem, considering the what the library docs say about nubBy..

"The 'nubBy' function behaves just like 'nub', except it uses a user-supplied equality predicate instead of the overloaded '==' function."

..and nub was indeed originally defined as,..

nub=nubBy compare

..as is the local nub in the example.

So right or wrong, shouldn't we see the same result in each case? Here's the source of the problem I think:

elem_by :: (a -> a -> Bool) -> a -> [a] -> Bool
elem_by _  _ []         =  False
elem_by eq y (x:xs)     =  x `eq` y || elem_by eq y xs

vs.

elem _ []	= False
elem x (y:ys)	= x==y || elem x ys

Also, where does the H98 report say all instances of Eq must be transitive, reflexive, symmetric, and antisymmetric? It just says "The Eq class provides equality (==)..", whatever that might mean :-)

comment:3 Changed 6 years ago by guest

(from Adrian Hey)

Erm..that should be..

nub=nubBy (==)

comment:4 Changed 6 years ago by jdressel

Jeremy is correct: I forgot a case in my definition of (==) that makes it not completely symmetric for second layer commutative equivalence (oops).

The relevant demonstration for the example case that failed in the linked codepaste is:

  *Main> let a = Branch (Branch (Leaf (Just 9)) Add (Leaf (Just 9))) Add (Leaf (Just 2))
  *Main> let b = Branch (Leaf (Just 9)) Add (Branch (Leaf (Just 9)) Add  (Leaf (Just 2)))
  *Main> a == b
  True
  *Main> b == a
  False
  *Main> List.nubBy (==) [a,b]
  [((9+9)+2)]
  *Main> List.nub [a,b]
  [((9+9)+2),(9+(9+2))]

I guess the lack of symmetry in (==) causes the difference in behavior, which makes more sense to me now.

However, Adrian is also correct, the difference in behavior between "nub" and "nubBy (==)" is what startled me. I would have found the bug in my code eventually (especially after unit testing).

comment:5 Changed 6 years ago by simonmar

  • Difficulty set to Unknown
  • Owner set to simonmar
  • Status changed from reopened to new

comment:6 in reply to: ↑ 2 Changed 6 years ago by JeremyShaw

Replying to guest:

(From Adrian Hey)

Well there still seems to be a problem, considering the what the library docs say about nubBy..

hrm, interesting. I am guessing that even with 'valid' Eq instances that could be important due to laziness and bottom?

Also, where does the H98 report say all instances of Eq must be transitive, reflexive, symmetric, and antisymmetric? It just says "The Eq class provides equality (==)..", whatever that might mean :-)

Well, it does not say it explicitly, but I suspect H98's usage of Eq implicitly demands those laws be followed. Hopefully in Haskell' the laws will not only be stated, but there will be some QuickCheck-style properties you can use to test your own instances ;)

comment:7 Changed 6 years ago by igloo

  • Milestone set to 6.10.1

comment:8 Changed 6 years ago by simonmar

  • Resolution set to fixed
  • Status changed from new to closed

Fixed:

Tue Sep  2 10:29:50 BST 2008  Simon Marlow <marlowsd@gmail.com>                 
  * #2528: reverse the order of args to (==) in nubBy to match nub

comment:9 Changed 6 years ago by simonmar

  • Architecture changed from Unknown to Unknown/Multiple

comment:10 Changed 3 years ago by maeder

  • Owner simonmar deleted
  • Resolution fixed deleted
  • Status changed from closed to new
  • Type of failure set to None/Unknown

The last change made the implementation and reported prelude version (and thus the documentation of elem_by) inconsistent, see http://www.haskell.org/pipermail/libraries/2011-September/016758.html

I propose to reverse the order or args under "USE_REPORT_PRELUDE", too, because I think, this is the right behavior and has the least impact.

The filter predicate could also be written as "(not . (`eq` x))".

"elem_by eq y xs" could be replaced by "any (eq y) xs" (and elem_by deleted), but I don't know if this has any performance impacts.

comment:12 Changed 3 years ago by maeder

Well, at least the wrong comment containing "same order" should be corrected, extended or removed.

comment:13 Changed 14 months ago by igloo

  • Milestone changed from 6.10.1 to 7.8.1

See also #7913

comment:14 Changed 7 months ago by George

  • Summary changed from nub not as reliable as nubBy to documentation for nub and nubBy should be corrected, extended or removed.

comment:15 Changed 3 months ago by ekmett

  • Cc ekmett@… hvr ekmett added

Jeremy,

Reflexivity is violated in base by the instances for Float/Double, by the behavior around NaN so that law falls apart quickly.

Eq is also not structural (as demonstrated by -0 and 0 being (==) in Float/Double), though you didn't offer that one.

Ord for those is a bit like a funhouse playground. ;)

comment:16 Changed 3 months ago by thoughtpolice

  • Milestone changed from 7.8.3 to 7.10.1

Moving to 7.10.1.

Note: See TracTickets for help on using tickets.