Haskell: surprising behavior of “groupBy”

雨燕双飞 提交于 2019-11-30 06:42:54

Have a look at the ghc implementation of groupBy:

groupBy                 :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _  []           =  []
groupBy eq (x:xs)       =  (x:ys) : groupBy eq zs
                           where (ys,zs) = span (eq x) xs

Now compare these two outputs:

Prelude List> groupBy (<) [1, 2, 3, 2, 4, 1, 5, 9]
[[1,2,3,2,4],[1,5,9]]
Prelude List> groupBy (<) [8, 2, 3, 2, 4, 1, 5, 9]
[[8],[2,3],[2,4],[1,5,9]]

In short, what happens is this: groupBy assumes that the given function (the first argument) tests for equality, and thus assumes that the comparison function is reflexive, transitive and symmetric (see equivalence relation). The problem here is that the less-than relation is not reflexive, nor symmetric.


Edit: The following implementation only assumes transitivity:

groupBy' :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy' _   []                        = []
groupBy' _   [x]                       = [[x]]
groupBy' cmp (x:xs@(x':_)) | cmp x x'  = (x:y):ys
                           | otherwise = [x]:r
  where r@(y:ys) = groupBy' cmp xs

The fact that "<" isn't an equality test.

You might expect some behavior because you'd implement it differently, but that isn't what it promises.

An example of why what it outputs is a reasonable answer is if it sweeps through it, doing

[1, 2, 3, 2, 4, 1, 5, 9] ->
[[1,2,3], [2,4], [1,5,9]]

Now has 3 groups of equal elements. So it checks if any of them are in fact the same:

Since it knows all elements in each group is equal, it can just look at the first element in each, 1, 2 and 1.

1 > 2? Yes! So it merges the first two groups.

1 > 1? No! So it leaves the last group be.

And now it's compared all elements for equality.

...only, you didn't pass it the kind of function it expected.

In short, when it wants an equality test, give it an equality test.

The problem is that the reference implementation of groupBy in the Haskell Report compares elements against the first element, so the groups are not strictly increasing (they just have to be all bigger than the first element). What you want instead is a version of groupBy that tests on adjacent elements, like the implementation here.

I'd just like to point out that the groupBy function also requires your list to be sorted before being applied.

For example:

equalityOp :: (a, b1) -> (a, b2) -> Bool
equalityOp x y = fst x == fst y

testData = [(1, 2), (1, 4), (2, 3)]

correctAnswer = groupBy equalityOp testData == [[(1, 2), (1, 4)], [(2, 3)]]

otherTestData = [(1, 2), (2, 3), (1, 4)]

incorrectAnswer = groupBy equalityOp otherTestData == [[(1, 2)], [(2, 3)], [(1, 4)]]

This behaviour comes about because groupBy is using span in its definition. To get reasonable behaviour which doesn't rely on us having the underlying list in any particular order we can define a function:

groupBy' :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy' eq []     = []
groupBy' eq (x:xs) = (x:similarResults) : (groupBy' eq differentResults)
    where similarResults   = filter (eq x) xs
          differentResults = filter (not . eq x) xs
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!