Opened 20 months ago

Last modified 20 months ago

#13593 new task

Unexpected behavior from Data.List.groupBy

Reported by: dsf Owned by:
Priority: normal Milestone:
Component: Core Libraries Version: 8.0.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Documentation bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

I was hoping that

let notBoth1 a b = not (a == 1 && b == 1) in
groupBy notBoth1  [1,1,2,3,1,1,4,5,6,1]

would give me

[[1],[1,2,3,1],[1,4,5,6,1]]

but instead I get

[[1],[1,2,3],[1],[1,4,5,6],[1]]

It seems that groupBy assumes transitivity in the argument function. I have a new implementation that does not make this assumption. Of course, the implications of changing this function's behavior are troubling.

groupBy' :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy' p (x : xs) = go [x] xs
    where
      go (x : xs) (y : zs) | p x y = go (y : x : xs) zs
      go g (y : zs) = reverse g : go [y] zs
      go g [] = [reverse g]

Change History (17)

comment:1 Changed 20 months ago by dsf

Also needs a clause groupBy' _ [] = []

comment:2 Changed 20 months ago by dsf

To be fair, the documentation says "supply your own equality function", and what I'm supplying above is not an equality function. But a better interface might be something like

groupOn :: Eq b => (a -> b) -> [a] -> [[a]]
groupOn f xs = groupBy ((==) `on` f) xs

comment:3 Changed 20 months ago by bgamari

The behavior of groupBy is defined by the Haskell Report. We really can't unilaterally change it in GHC.

comment:4 Changed 20 months ago by dsf

Of course.

comment:5 Changed 20 months ago by dsf

Resolution: wontfix
Status: newclosed

comment:6 Changed 20 months ago by Iceland_jack

Hold on, it's at least worth documenting this infelicity. Ideally the user would be directed to a correct version

comment:7 Changed 20 months ago by dsf

Resolution: wontfix
Status: closednew
Type: bugtask
Type of failure: Incorrect result at runtimeDocumentation bug

comment:8 Changed 20 months ago by dsf

I would add:

It is assumed that the argument function is transitive. A function like not (a == 1 && b == 1) will give a result you might not expect:

> let notBoth1 a b = not (a == 1 && b == 1)
> groupBy notBoth1  [1,1,2,3,1,1,4,5,6,1]
[[1],[1,2,3],[1],[1,4,5,6],[1]]

rather than

[[1],[1,2,3,1],[1,4,5,6,1]]

comment:9 Changed 20 months ago by svenpanne

The Haskell report is very explicit about these functions:

When the "By" function replaces an Eq context by a binary predicate, the predicate is assumed to define an equivalence; when the "By" function replaces an Ord context by abinary predicate, the predicate is assumed to define a total ordering.

And the Data.List documentation repeats these requirements, so I propose to close this issue as "invalid".

As a side note, I wouldn't call this an "infelicity": If your Eq/Ord instances don't obey the required laws, you would have similar fun with plain group. The By functions just make an implicit argument explicit, and I would be very surprised if they behaved differently.

comment:10 Changed 20 months ago by dsf

Yes, the Haskell report does contain this specification. I'm just trying to improve the documentation which is typically used by someone who is in the process of building Haskell code and is perhaps somewhat removed (at that moment) from the design decisions discussed in the Haskell report. I have found that the requirements of the reference documentation used while writing code is very different from those of the documents defining the language. The name and signature of groupBy give the impression that what I tried to do above would work, so I'm assuming that my mistake and consequent loss of time might be one that others would also make.

comment:11 Changed 20 months ago by bgamari

Indeed, we should absolutely update the documentation to reflect this. I'll add a variant of the language in comment:8 to the Haddocks.

comment:12 Changed 20 months ago by svenpanne

Hmmm, does that mean that every single function under 'The "By" operations' will get the sentence 'The predicate is assumed to define an equivalence.' '/ The function is assumed to define a total ordering.'? It's a little bit like repeating similar sentences for every function taking an Eq/Ord context...

comment:13 Changed 20 months ago by bgamari

Well, my plan was just to point this out for groupBy (see Phab:D3475), but I suppose you are right that this theme applies to many of the *By functions. Frankly, Haskell's core library documentation is often accused of being too sparse, so I don't consider a bit (perhaps slightly repetitious) detail to be a problem.

However, if we did want to avoid repeating ourselves we could just add a general discussion of the expectations of the By functions to the documentation at the head of the module.

comment:14 in reply to:  12 Changed 20 months ago by dsf

Replying to svenpanne:

Hmmm, does that mean that every single function under 'The "By" operations' will get the sentence 'The predicate is assumed to define an equivalence.' '/ The function is assumed to define a total ordering.'? It's a little bit like repeating similar sentences for every function taking an Eq/Ord context...

Definitely not - this is a recipe for teaching people to ignore stuff. Elsewhere I said it makes sense to put it in the header, but consider that the particularly unexpected results come from groupBy. In the other cases you probablyu wouldn't consider using a non-transitive function argument:

-- What do these mean?
> nubBy notBoth1 [1,1,2,3,1,1,4,5,6,1]
[1,1,1,1,1]
> deleteBy notBoth1 3 [1,1,2,3,1,1,4,5,6,1]
[1,2,3,1,1,4,5,6,1]

comment:15 Changed 20 months ago by svenpanne

I still claim that groupBy is by no means special: We have literally tons of functions which expect that some argument, be it implicit or explicit, obeys some laws, and those laws are not enforced by the type (so you can actually write down the type without having a Ph.D. in theoretical CS ;-). Take e.g. Functor: We expect that the Functor laws are obeyed by all instances. If they're broken, all odds are off and probably most of the function expecting such an argument do strange things. Non-transitive comparison for sortBy? Have fun... etc. etc. Nevertheless, we don't repeat such law-breaking examples all over the library documentation.

Do we really know which of the 10 fooBy functions do something remotely sensible if they don't get an equivalence/total ordering? I doubt that. Without looking at the implementation I would have a hard time figuring that out.

Putting the notBoth1 example below the 'User-supplied equality (replacing an Eq context)' header and just repeating the equivalence requirement in each function documentation is probably the right thing. Similarly for Ord.

comment:16 Changed 20 months ago by dsf

That seems satisfactory.

comment:17 Changed 20 months ago by bgamari

I haven't read these comments through to their conclusion but perhaps someone could offer a patch implementing what was agreed upon?

Note: See TracTickets for help on using tickets.