Can I check whether a bounded list contains duplicates, in linear time?

问题

Suppose I have an Int list where elements are known to be bounded and the list is known to be no longer than their range, so that it is entirely possible for it not to contain duplicates. How can I test most quickly whether it is the case?

I know of nubOrd. It is quite fast. We can pass our list through and see if it becomes shorter. But the efficiency of nubOrd is still not linear.

My idea is that we can trade space for time efficiency. Imperatively, we would allocate a bit field as wide as our range, and then traverse the list, marking the entries corresponding to the list elements' values. As soon as we try to flip a bit that is already 1, we return False. It only takes (read + compare + write) * length of the list. No binary search trees, no nothing.

Is it reasonable to attempt a similar construction in Haskell?

回答1:

The discrimination package has a linear time nub you can use. Or a linear time group that doesn't require the equivalent elements to be adjacent in order to group them, so you could see if any of the groups are not size 1.

The whole package is based on sidestepping the well known bounds on comparison-based sorts (and joins, and etc) by using algorithms based on "discrimination" rather than ones based on comparisons. As I understand it, the technique is somewhat like a radix sort, but generalised to ADTs.

回答2:

For integers (and other Ix-like types), you could use a mutable array, for example with the array package.

We can here use a STUArray here, like:

import Control.Monad.ST
import Data.Array.ST

updateDups_ :: [Int] -> STArray s Int Bool -> ST s Bool
updateDups_ [] _ = return False
updateDups_ (x:xs) arr = do
    contains <- readArray arr x
    if contains then return True
    else writeArray arr x True >> updateDups_ xs arr

withDups_ :: Int -> [Int] -> ST s Bool
withDups_ mx l = newArray (0, mx) False >>= updateDups_ l

withDups :: Int -> [Int] -> Bool
withDups mx ls = runST (withDups_ mx ls)

For example:

Prelude Control.Monad.ST Data.Array.ST> withDups 17 [1,4,2,5]
False
Prelude Control.Monad.ST Data.Array.ST> withDups 17 [1,4,2,1]
True
Prelude Control.Monad.ST Data.Array.ST> withDups 17 [1,4,2,16,2]
True

So here the first parameter is the maximum value that can be added in the list, and the second parameter the list of values we want to check.

回答3:

So you have a list of size N, and you know that the elements in the list are within the range min .. min+N-1.

There is a simple linear time algorithm that requires O(1) space.

First, scan the list to find the minimum and maximum elements.

If (max - min + 1) < N then you know there's a duplicate. Otherwise ...

Because the range is N, the minimum item can go at a[0], and the max item at a[n-1]. You can map any item to its position in the array simply by subtracting min. You can do an in-place sort in O(n) because you know exactly where every item should go.

Starting at the beginning of the list, take the first element and subtract min to determine where it should go. Go to that position, and replace the item that's there. With the new item, compute where it should go, and replace the item in that position, etc.

If you ever get to a point where you're you're trying to place an item at a[x], and the value already there is the value that's supposed to be there (i.e. a[x] == x+min), then you've found a duplicate.

The code to do all this is pretty simple:

Corrected code.

min, max = findMinMax()
currentIndex = 0
while currentIndex < N
    temp = a[currentIndex]
    targetIndex = temp - min;
    // Do this until we wrap around to the current index
    // If the item is already in place, then targetIndex == currentIndex,
    // and we won't enter the loop.
    while targetIndex != currentIndex
        if (a[targetIndex] == temp)
            // the item at a[targetIndex] is the item that's supposed to be there.
            // The only way that can happen is if the item we have in temp is a duplicate.
            found a duplicate
        end if
        save = a[targetIndex]
        a[targetIndex] = temp
        temp = save
        targetIndex = temp - min
    end while
    // At this point, targetIndex == currentIndex.
    // We've wrapped around and need to place the last item.
    // There's no need to check here if a[targetIndex] == temp, because if it did,
    // we would not have entered the loop.
    a[targetIndex] = temp
    ++currentIndex
end while

That's the basic idea.

来源：https://stackoverflow.com/questions/56996405/can-i-check-whether-a-bounded-list-contains-duplicates-in-linear-time

标签

algorithm

haskell

complexity-theory