Opened 12 years ago

Closed 11 years ago

Last modified 16 months ago

#1460 closed proposal (fixed)

Problem with Monoid instance of Data.Map

Reported by: ahey@… Owned by:
Priority: normal Milestone: Not GHC
Component: libraries/base Version: 6.6.1
Keywords: Data.Map Monoid Cc: andrewthad
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

See..

http://www.haskell.org/pipermail/libraries/2007-May/007491.html

Data.Map Monoid instance is currently..

instance (Ord k) => Monoid (Map k v) where
    mempty  = empty
    mappend = union
    mconcat = unions

..but probably should be..

instance (Ord k, Monoid v) => Monoid (Map k v) where
 mempty            = empty
 mappend map0 map1 = unionWith mappend map0 map1
 mconcat maps      = foldr (unionWith mappend) empty maps

Adrian Hey

Change History (13)

comment:1 Changed 12 years ago by SamB

I'm inclined to agree, though I recognize that the other way may well be useful too. There ought to be a way to warn people when you've changed instances like this... release notes, maybe?

comment:2 Changed 12 years ago by igloo

Milestone: Not GHC

comment:3 Changed 11 years ago by igloo

Resolution: fixed
Status: newclosed

This proposal seems to be abandoned

comment:4 Changed 10 years ago by simonmar

Architecture: MultipleUnknown/Multiple

comment:5 Changed 10 years ago by simonmar

Operating System: MultipleUnknown/Multiple

comment:6 Changed 9 years ago by simonmar

difficulty: Easy (1 hr)Easy (less than 1 hour)

comment:7 Changed 5 years ago by nomeata

Type of failure: None/Unknown

This comes up regularly, e.g. April 2012 as well (http://www.haskell.org/pipermail/libraries/2012-April/017743.html) and right now again (here, locally). Just saying.

comment:8 Changed 18 months ago by andrewthad

I'd like to bring this up again. Especially since Semigroup, in the next two years, is going to end up as a superclass of Monoid. The even better instance would be:

instance (Ord k, Semigroup v) => Monoid (Map k v)

In the mailing list that Joachim linked to, this kind of change seemed to attract pretty broad support. I think the only reasonable way to make this change without introducing silent breakage would be to release a containers-0.6 in which Map has no Monoid instance. Then, in containers-0.7, the new instance would be introduced. This is precisely what Henning proposed in https://mail.haskell.org/pipermail/libraries/2012-April/017749.html, and I think it's the best option available for switching out this instance. These two release would need to be at least a year apart. We would probably like for their to be a stackage lts between them that had containers-0.6. And we would probably want to wait for Semigroup to actually become a superclass of Monoid in base before releasing semigroups-0.7, just because it's really annoying when you end up with a (Semigroup v, Monoid v) constraint sometimes.

I would like to draw attention to the two arguments against making this change:

  1. It would break a lot of other packages
  2. The behavior value-combining Monoid instance is easily recovered by using unionWith. This reasoning roughly motivates this post: https://mail.haskell.org/pipermail/libraries/2012-April/017745.html

Concerning argument (1), I offer no rebuttal. I'll only point out that, with stackage, we now have a somewhat convenient way to attempt to measure the breakage. But, no one has ever done this, and it might be worth looking into.

Argument (2) I would like to challenge. In http://www.haskellforall.com/2014/07/equational-reasoning-at-scale.html, Gabriel Gonzalez uses, in a practical way, the behavior of data types that can lift monoidal value into another monoid. Things like:

instance Monoid a => Monoid (IO a)
instance (Monoid a, Monoid b) => Monoid (a,b)
instance Monoid b => Monoid (a -> b)

These are useful because they give us Monoid instances for types like:

Int -> Bool -> Char -> IO ([Text],Ordering)

The value-combining Monoid instance for Map is in this same spirit. It would allow us to trivially combine nested map with types like this:

type M = Map Int (Map Char (Map PersonId (Map ItemId [Text])))
mappend :: M -> M -> M

This behavior can be recovered by unionWith, but it's a lengthy explicit lifting:

deepAppend :: M -> M -> M
deepAppend = unionWith (unionWith (unionWith (unionWith (<>))))

I plan on continuing this tomorrow. I have a little more to say.

Last edited 18 months ago by andrewthad (previous) (diff)

comment:9 Changed 18 months ago by andrewthad

What I'm trying to argue is that the two possible Monoid instances for Map aren't really equal in terms of how much they buy you. Let's look at merging nested Maps again. The question here is "how much code do I have to write to merge these?" Both for a deep merge and for a shallow merge.

type M = Map Int (Map Char (Map PersonId (Map ItemId (Set Foo))))

-- Assuming the current Monoid instance of Map
deep,shallow :: M -> M -> M
deep = unionWith (unionWith (unionWith (unionWith (<>))))
shallow = (<>)

-- Assuming the value-combining Monoid instance of Map
deep,shallow :: M -> M -> M
deep = (<>)
shallow = union

My point is that the value-combining instance helps us more than the left-biased instance. Since the instances chain together, it's just one type we're getting a better Monoid instance for. It's any arbitrary nesting of them.

Let's look at a situation where the left-biased instance fares a little better. What about the situations where you want a left-biased merge at the bottom of a nesting of Monoids? So, not a nesting of Maps, but a nesting of other Monoids with a Map at the bottom:

type K = Int -> Char -> IO ([ParseError],Map PersonId Person)

The left-biased instance has the desirable behavior here. If, after have concatenated a bunch of expressions of type K, we try to parse two people with the same id, we silently discard the second one. But, we can easily recover this same behavior with the value-combining Monoid instance with:

foo :: Int -> Char -> IO ([ParseError],Map PersonId (First Person))

If you don't actually want that First newtype in your final result, it can be coerced away safely as a noop. So, since the value-combining instance can express the left-biased instance, even in the worst cases, the semantics of the left-biased instance can be recovered by clever use of coerce. However, the reverse is not true. The left-biased instance cannot express the value-combining union, so currently when we want this, we have to step outside the realm of Monoid instances entirely and manually do the lifting. Just to emphasize the asymmetry in the amount and kind of code the end user must write:

type J = Bar -> Baz -> IO ([Log], Map Pokemon (Set Attack))
type V = Bar -> Baz -> IO ([Log], Map Identifier Name)

leftBiased :: V -> V -> V
valueMerging :: J -> J -> J

-- with the left-biased Monoid instance
leftBiased = (<>)
valueMerging v1 v2 a b = liftA2 (\(x1,y1) (x2,y2) -> (x1 <> x2, unionWith (<>) y1 y2)) (v1 a b) (v2 a b)

-- with the value-merging Monoid instance
type V' = Bar -> Baz -> IO ([Log], Map Identifier (First Name))
leftBiased a b = coerce ((coerce a) <> (coerce b :: V'))
valueMerging = (<>)

I know using coerce isn't the prettiest thing, but notice that with the value-merging Monoid instance, we are able to leverage the automatic lifting of Monoid instances to give us a desired behavior. With the left-biased instance, we're stuck doing this by hand.

comment:10 Changed 18 months ago by bgamari

I agree that a monoidal-combining Map is much nicer than our current Map (and usually what I find I need); I have been a proponent of this change in the past and even maintain the monoidal-containers library wrapping containers with these semantics.

However, my impression is that argument (1) mentioned in comment:8 alone is enough to sink this proposal in many people's minds. In fact, with a few years of experience behind me since last looking at this, I may even agree. I don't think it is realistic to expect that we will be able to easily discern broken packages in a large scale study; breakage due to semantic changes like this can be very hard to spot. In the best case you have a use-site where the value type has no Semigroup instance, in which case the change will elicit a compile-time error. However, in the case where you do have an instance on-hand you will see silent breakage, which may or may not be evident in the runtime behavior of the program or test failures. Moreover, with the ubiquity of packages with lax handling of upper bounds will likely make managing such a change challenging, even if breakage can be readily identified; even a super-major version bump will have no effect on a package that has no bounds at all.

Unfortunately, I suspect the only way we can make this change happen is via the introduction of new interfaces and not changing those that we already have. This may be via continuing to use monoidal-containers and others (e.g. reducers), although I admit that this isn't a terribly great option. Perhaps we could discuss merging a wrapper like monoidal-containers into containers (and, perhaps, unordered-containers). The good news is that with Backpack we are now slightly better equipped to provide such parallel interfaces; unfortunately it will likely be several years before we can rely on Backpack for a critical package like containers.

comment:11 Changed 18 months ago by andrewthad

I absolutely agree that making the change in a way that silently introduces errors would be bad. That's why my proposal was to remove the instance first and then replace it later. With a major-version bump, the Monoid instance would be removed entirely. Then, everything depending on in would fail loudly. There would be no Monoid instance at all for probably 18-24 months. Let's say that it was containers-0.6.0 that removed the instance. Eventually, stackage, nixpkgs, and the Haskell Platform would all pick up containers-0.6.0 and put some pressure of their users to stop using the Monoid instance for Map. Then, after everyone's code was updated, the next release could introduce the new instance.

Most of the breakage would be really easy to fix. Additionally, it should all be compile-time failures. The only silent breakage that could happen would be if someone didn't update their dependencies for two years and then tried to switch straight from the old containers to the newest one. I suspect this situation would be uncommon (although, as someone would is primarily an application developer, I certainly cannot speak for everyone on this, and I may be wrong about this, which would destroy my argument for this approach).

So basically, I just wanted to clarify that while there would be breakage, none of it would be silent. Obviously, this still has to be weighed, but I think it's much less bad than the undetectable semantic changes you describe.

comment:12 Changed 18 months ago by andrewthad

Cc: andrewthad added

comment:13 Changed 16 months ago by txnull

I find both instances useful.

Maybe there should not be a instance at all and everybody has to create one as needed or create both using newtypes?

Note: See TracTickets for help on using tickets.