Haskell: surprising behavior of “groupBy”

后端 未结 4 1018
無奈伤痛
無奈伤痛 2020-12-30 04:07

I\'m trying to figure out the behavior of the library function groupBy (from Data.List), which purports to group elements of a list by an \"equality test\" function passed i

相关标签:
4条回答
  • 2020-12-30 04:35

    Have a look at the ghc implementation of groupBy:

    groupBy                 :: (a -> a -> Bool) -> [a] -> [[a]]
    groupBy _  []           =  []
    groupBy eq (x:xs)       =  (x:ys) : groupBy eq zs
                               where (ys,zs) = span (eq x) xs
    

    Now compare these two outputs:

    Prelude List> groupBy (<) [1, 2, 3, 2, 4, 1, 5, 9]
    [[1,2,3,2,4],[1,5,9]]
    Prelude List> groupBy (<) [8, 2, 3, 2, 4, 1, 5, 9]
    [[8],[2,3],[2,4],[1,5,9]]
    

    In short, what happens is this: groupBy assumes that the given function (the first argument) tests for equality, and thus assumes that the comparison function is reflexive, transitive and symmetric (see equivalence relation). The problem here is that the less-than relation is not reflexive, nor symmetric.


    Edit: The following implementation only assumes transitivity:

    groupBy' :: (a -> a -> Bool) -> [a] -> [[a]]
    groupBy' _   []                        = []
    groupBy' _   [x]                       = [[x]]
    groupBy' cmp (x:xs@(x':_)) | cmp x x'  = (x:y):ys
                               | otherwise = [x]:r
      where r@(y:ys) = groupBy' cmp xs
    
    0 讨论(0)
  • 2020-12-30 04:39

    I'd just like to point out that the groupBy function also requires your list to be sorted before being applied.

    For example:

    equalityOp :: (a, b1) -> (a, b2) -> Bool
    equalityOp x y = fst x == fst y
    
    testData = [(1, 2), (1, 4), (2, 3)]
    
    correctAnswer = groupBy equalityOp testData == [[(1, 2), (1, 4)], [(2, 3)]]
    
    otherTestData = [(1, 2), (2, 3), (1, 4)]
    
    incorrectAnswer = groupBy equalityOp otherTestData == [[(1, 2)], [(2, 3)], [(1, 4)]]
    

    This behaviour comes about because groupBy is using span in its definition. To get reasonable behaviour which doesn't rely on us having the underlying list in any particular order we can define a function:

    groupBy' :: (a -> a -> Bool) -> [a] -> [[a]]
    groupBy' eq []     = []
    groupBy' eq (x:xs) = (x:similarResults) : (groupBy' eq differentResults)
        where similarResults   = filter (eq x) xs
              differentResults = filter (not . eq x) xs
    
    0 讨论(0)
  • 2020-12-30 04:51

    The problem is that the reference implementation of groupBy in the Haskell Report compares elements against the first element, so the groups are not strictly increasing (they just have to be all bigger than the first element). What you want instead is a version of groupBy that tests on adjacent elements, like the implementation here.

    0 讨论(0)
  • 2020-12-30 04:54

    The fact that "<" isn't an equality test.

    You might expect some behavior because you'd implement it differently, but that isn't what it promises.

    An example of why what it outputs is a reasonable answer is if it sweeps through it, doing

    [1, 2, 3, 2, 4, 1, 5, 9] ->
    [[1,2,3], [2,4], [1,5,9]]
    

    Now has 3 groups of equal elements. So it checks if any of them are in fact the same:

    Since it knows all elements in each group is equal, it can just look at the first element in each, 1, 2 and 1.

    1 > 2? Yes! So it merges the first two groups.

    1 > 1? No! So it leaves the last group be.

    And now it's compared all elements for equality.

    ...only, you didn't pass it the kind of function it expected.

    In short, when it wants an equality test, give it an equality test.

    0 讨论(0)
提交回复
热议问题