vba: get unique values from array

前端 未结 9 2095
北恋
北恋 2020-11-22 15:30

Is there any built-in functionality in vba to get unique values from a one-dimensional array? What about just getting rid of duplicates?

If not, then how would I get

相关标签:
9条回答
  • 2020-11-22 15:45

    There is no VBA built in functionality for removing duplicates from an array, however you could use the next function:

    Function RemoveDuplicates(MyArray As Variant) As Variant
        With CreateObject("scripting.dictionary")
            For Each item In MyArray
                c00 = .Item(item)
            Next
            sn = .keys ' the array .keys contains all unique keys
            MsgBox Join(.keys, vbLf) ' you can join the array into a string
            RemoveDuplicates = .keys ' return an array without duplicates
        End With
    End Function
    
    0 讨论(0)
  • 2020-11-22 15:54

    I don't know of any built-in functionality in VBA. The best would be to use a collection using the value as key and only add to it if a value doesn't exist.

    0 讨论(0)
  • 2020-11-22 15:56

    No, VBA does not have this functionality. You can use the technique of adding each item to a collection using the item as the key. Since a collection does not allow duplicate keys, the result is distinct values that you can copy to an array, if needed.

    You may also want something more robust. See Distinct Values Function at http://www.cpearson.com/excel/distinctvalues.aspx

    Distinct Values Function

    A VBA Function that will return an array of the distinct values in a range or array of input values.

    Excel has some manual methods, such as Advanced Filter, for getting a list of distinct items from an input range. The drawback of using such methods is that you must manually refresh the results when the input data changes. Moreover, these methods work only with ranges, not arrays of values, and, not being functions, cannot be called from worksheet cells or incorporated into array formulas. This page describes a VBA function called DistinctValues that accepts as input either a range or an array of data and returns as its result an array containing the distinct items from the input list. That is, the elements with all duplicates removed. The order of the input elements is preserved. The order of the elements in the output array is the same as the order in the input values. The function can be called from an array entered range on a worksheet (see this page for information about array formulas), or from in an array formula in a single worksheet cell, or from another VB function.

    0 讨论(0)
  • 2020-11-22 15:58

    The Collection and Dictionary solutions are all nice and shine for a short approach, but if you want speed try using a more direct approach:

    Function ArrayUnique(ByVal aArrayIn As Variant) As Variant
    ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
    ' ArrayUnique
    ' This function removes duplicated values from a single dimension array
    ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
    Dim aArrayOut() As Variant
    Dim bFlag As Boolean
    Dim vIn As Variant
    Dim vOut As Variant
    Dim i%, j%, k%
    
    ReDim aArrayOut(LBound(aArrayIn) To UBound(aArrayIn))
    i = LBound(aArrayIn)
    j = i
    
    For Each vIn In aArrayIn
        For k = j To i - 1
            If vIn = aArrayOut(k) Then bFlag = True: Exit For
        Next
        If Not bFlag Then aArrayOut(i) = vIn: i = i + 1
        bFlag = False
    Next
    
    If i <> UBound(aArrayIn) Then ReDim Preserve aArrayOut(LBound(aArrayIn) To i - 1)
    ArrayUnique = aArrayOut
    End Function
    

    Calling it:

    Sub Test()
    Dim aReturn As Variant
    Dim aArray As Variant
    
    aArray = Array(1, 2, 3, 1, 2, 3, "Test", "Test")
    aReturn = ArrayUnique(aArray)
    End Sub
    

    For speed comparasion, this will be 100x to 130x faster then the dictionary solution, and about 8000x to 13000x faster than the collection one.

    0 讨论(0)
  • 2020-11-22 15:59

    Update (6/15/16)

    I have created much more thorough benchmarks. First of all, as @ChaimG pointed out, early binding makes a big difference (I originally used @eksortso's code above verbatim which uses late binding). Secondly, my original benchmarks only included the time to create the unique object, however, it did not test the efficiency of using the object. My point in doing this is, it doesn't really matter if I can create an object really fast if the object I create is clunky and slows me down moving forward.

    Old Remark: It turns out, that looping over a collection object is highly inefficient

    It turns out that looping over a collection can be quite efficient if you know how to do it (I didn't). As @ChaimG (yet again), pointed out in the comments, using a For Each construct is ridiculously superior to simply using a For loop. To give you an idea, before changing the loop construct, the time for Collection2 for the Test Case Size = 10^6 was over 1400s (i.e. ~23 minutes). It is now a meager 0.195s (over 7000x faster).

    For the Collection method there are two times. The first (my original benchmark Collection1) show the time to create the unique object. The second part (Collection2) shows the time to loop over the object (which is very natural) to create a returnable array as the other functions do.

    In the chart below, a yellow background indicates that it was the fastest for that test case, and red indicates the slowest ("Not Tested" algorithms are excluded). The total time for the Collection method is the sum of Collection1 and Collection2. Turquoise indicates that is was the fastest regardless of original order.

    Below is the original algorithm I created (I have modified it slightly e.g. I no longer instantiate my own data type). It returns the unique values of an array with the original order in a very respectable time and it can be modified to take on any data type. Outside of the IndexMethod, it is the fastest algorithm for very large arrays.

    Here are the main ideas behind this algorithm:

    1. Index the array
    2. Sort by values
    3. Place identical values at the end of the array and subsequently "chop" them off.
    4. Finally, sort by index.

    Below is an example:

    Let myArray = (86, 100, 33, 19, 33, 703, 19, 100, 703, 19)
    
        1.  (86, 100, 33, 19, 33, 703, 19, 100, 703, 19)
            (1 ,   2,  3,  4,  5,   6,  7,   8,   9, 10)   <<-- Indexing
    
        2.  (19, 19, 19, 33, 33, 86, 100, 100, 703, 703)   <<-- sort by values     
            (4,   7, 10,  3,  5,  1,   2,   8,   6,   9)
    
        3.  (19, 33,  86, 100, 703)   <<-- remove duplicates    
            (4,   3,   1,   2,   6)
    
        4.  (86, 100,  33, 19, 703)   
            ( 1,   2,   3,  4,   6)   <<-- sort by index
    

    Here is the code:

    Function SortingUniqueTest(ByRef myArray() As Long, bOrigIndex As Boolean) As Variant
        Dim MyUniqueArr() As Long, i As Long, intInd As Integer
        Dim StrtTime As Double, Endtime As Double, HighB As Long, LowB As Long
    
        LowB = LBound(myArray): HighB = UBound(myArray)
    
        ReDim MyUniqueArr(1 To 2, LowB To HighB)
        intInd = 1 - LowB  'Guarantees the indices span 1 to Lim
    
        For i = LowB To HighB
            MyUniqueArr(1, i) = myArray(i)
            MyUniqueArr(2, i) = i + intInd
        Next i
    
        QSLong2D MyUniqueArr, 1, LBound(MyUniqueArr, 2), UBound(MyUniqueArr, 2), 2
        Call UniqueArray2D(MyUniqueArr)
        If bOrigIndex Then QSLong2D MyUniqueArr, 2, LBound(MyUniqueArr, 2), UBound(MyUniqueArr, 2), 2
    
        SortingUniqueTest = MyUniqueArr()
    End Function
    
    Public Sub UniqueArray2D(ByRef myArray() As Long)
        Dim i As Long, j As Long, Count As Long, Count1 As Long, DuplicateArr() As Long
        Dim lngTemp As Long, HighB As Long, LowB As Long
        LowB = LBound(myArray, 2): Count = LowB: i = LowB: HighB = UBound(myArray, 2)
    
        Do While i < HighB
            j = i + 1
            If myArray(1, i) = myArray(1, j) Then
                Do While myArray(1, i) = myArray(1, j)
                    ReDim Preserve DuplicateArr(1 To Count)
                    DuplicateArr(Count) = j
                    Count = Count + 1
                    j = j + 1
                    If j > HighB Then Exit Do
                Loop
    
                QSLong2D myArray, 2, i, j - 1, 2
            End If
            i = j
        Loop
    
        Count1 = HighB
    
        If Count > 1 Then
            For i = UBound(DuplicateArr) To LBound(DuplicateArr) Step -1
                myArray(1, DuplicateArr(i)) = myArray(1, Count1)
                myArray(2, DuplicateArr(i)) = myArray(2, Count1)
                Count1 = Count1 - 1
                ReDim Preserve myArray(1 To 2, LowB To Count1)
            Next i
        End If
    End Sub
    

    Here is the sorting algorithm I use (more about this algo here).

    Sub QSLong2D(ByRef saArray() As Long, bytDim As Byte, lLow1 As Long, lHigh1 As Long, bytNum As Byte)
        Dim lLow2 As Long, lHigh2 As Long
        Dim sKey As Long, sSwap As Long, i As Byte
    
    On Error GoTo ErrorExit
    
        If IsMissing(lLow1) Then lLow1 = LBound(saArray, bytDim)
        If IsMissing(lHigh1) Then lHigh1 = UBound(saArray, bytDim)
        lLow2 = lLow1
        lHigh2 = lHigh1
    
        sKey = saArray(bytDim, (lLow1 + lHigh1) \ 2)
    
        Do While lLow2 < lHigh2
            Do While saArray(bytDim, lLow2) < sKey And lLow2 < lHigh1: lLow2 = lLow2 + 1: Loop
            Do While saArray(bytDim, lHigh2) > sKey And lHigh2 > lLow1: lHigh2 = lHigh2 - 1: Loop
    
            If lLow2 < lHigh2 Then
                For i = 1 To bytNum
                    sSwap = saArray(i, lLow2)
                    saArray(i, lLow2) = saArray(i, lHigh2)
                    saArray(i, lHigh2) = sSwap
                Next i
            End If
    
            If lLow2 <= lHigh2 Then
                lLow2 = lLow2 + 1
                lHigh2 = lHigh2 - 1
            End If
        Loop
    
        If lHigh2 > lLow1 Then QSLong2D saArray(), bytDim, lLow1, lHigh2, bytNum
        If lLow2 < lHigh1 Then QSLong2D saArray(), bytDim, lLow2, lHigh1, bytNum
    
    ErrorExit:
    
    End Sub
    

    Below is a special algorithm that is blazing fast if your data contains integers. It makes use of indexing and the Boolean data type.

    Function IndexSort(ByRef myArray() As Long, bOrigIndex As Boolean) As Variant
    '' Modified to take both positive and negative integers
        Dim arrVals() As Long, arrSort() As Long, arrBool() As Boolean
        Dim i As Long, HighB As Long, myMax As Long, myMin As Long, OffSet As Long
        Dim LowB As Long, myIndex As Long, count As Long, myRange As Long
    
        HighB = UBound(myArray)
        LowB = LBound(myArray)
    
        For i = LowB To HighB
            If myArray(i) > myMax Then myMax = myArray(i)
            If myArray(i) < myMin Then myMin = myArray(i)
        Next i
    
        OffSet = Abs(myMin)  '' Number that will be added to every element
                             '' to guarantee every index is non-negative
    
        If myMax > 0 Then
            myRange = myMax + OffSet  '' E.g. if myMax = 10 & myMin = -2, then myRange = 12
        Else
            myRange = OffSet
        End If
    
        If bOrigIndex Then
            ReDim arrSort(1 To 2, 1 To HighB)
            ReDim arrVals(1 To 2, 0 To myRange)
            ReDim arrBool(0 To myRange)
    
            For i = LowB To HighB
                myIndex = myArray(i) + OffSet
                arrBool(myIndex) = True
                arrVals(1, myIndex) = myArray(i)
                If arrVals(2, myIndex) = 0 Then arrVals(2, myIndex) = i
            Next i
    
            For i = 0 To myRange
                If arrBool(i) Then
                    count = count + 1
                    arrSort(1, count) = arrVals(1, i)
                    arrSort(2, count) = arrVals(2, i)
                End If
            Next i
    
            QSLong2D arrSort, 2, 1, count, 2
            ReDim Preserve arrSort(1 To 2, 1 To count)
        Else
            ReDim arrSort(1 To HighB)
            ReDim arrVals(0 To myRange)
            ReDim arrBool(0 To myRange)
    
            For i = LowB To HighB
                myIndex = myArray(i) + OffSet
                arrBool(myIndex) = True
                arrVals(myIndex) = myArray(i)
            Next i
    
            For i = 0 To myRange
                If arrBool(i) Then
                    count = count + 1
                    arrSort(count) = arrVals(i)
                End If
            Next i
    
            ReDim Preserve arrSort(1 To count)
        End If
    
        ReDim arrVals(0)
        ReDim arrBool(0)
    
        IndexSort = arrSort
    End Function
    

    Here are the Collection (by @DocBrown) and Dictionary (by @eksortso) Functions.

    Function CollectionTest(ByRef arrIn() As Long, Lim As Long) As Variant
        Dim arr As New Collection, a, i As Long, arrOut() As Variant, aFirstArray As Variant
        Dim StrtTime As Double, EndTime1 As Double, EndTime2 As Double, count As Long
    On Error Resume Next
    
        ReDim arrOut(1 To UBound(arrIn))
        ReDim aFirstArray(1 To UBound(arrIn))
    
        StrtTime = Timer
        For i = 1 To UBound(arrIn): aFirstArray(i) = CStr(arrIn(i)): Next i '' Convert to string
        For Each a In aFirstArray               ''' This part is actually creating the unique set
            arr.Add a, a
        Next
        EndTime1 = Timer - StrtTime
    
        StrtTime = Timer         ''' This part is writing back to an array for return
        For Each a In arr: count = count + 1: arrOut(count) = a: Next a
        EndTime2 = Timer - StrtTime
        CollectionTest = Array(arrOut, EndTime1, EndTime2)
    End Function
    
    Function DictionaryTest(ByRef myArray() As Long, Lim As Long) As Variant
        Dim StrtTime As Double, Endtime As Double
        Dim d As Scripting.Dictionary, i As Long  '' Early Binding
        Set d = New Scripting.Dictionary
        For i = LBound(myArray) To UBound(myArray): d(myArray(i)) = 1: Next i
        DictionaryTest = d.Keys()
    End Function
    

    Here is the Direct approach provided by @IsraelHoletz.

    Function ArrayUnique(ByRef aArrayIn() As Long) As Variant
        Dim aArrayOut() As Variant, bFlag As Boolean, vIn As Variant, vOut As Variant
        Dim i As Long, j As Long, k As Long
        ReDim aArrayOut(LBound(aArrayIn) To UBound(aArrayIn))
        i = LBound(aArrayIn)
        j = i
    
        For Each vIn In aArrayIn
            For k = j To i - 1
                If vIn = aArrayOut(k) Then bFlag = True: Exit For
            Next
            If Not bFlag Then aArrayOut(i) = vIn: i = i + 1
            bFlag = False
        Next
    
        If i <> UBound(aArrayIn) Then ReDim Preserve aArrayOut(LBound(aArrayIn) To i - 1)
        ArrayUnique = aArrayOut
    End Function
    
    Function DirectTest(ByRef aArray() As Long, Lim As Long) As Variant
        Dim aReturn() As Variant
        Dim StrtTime As Long, Endtime As Long, i As Long
        aReturn = ArrayUnique(aArray)
        DirectTest = aReturn
    End Function
    

    Here is the benchmark function that compares all of the functions. You should note that the last two cases are handled a little bit different because of memory issues. Also note, that I didn't test the Collection method for the Test Case Size = 10,000,000. For some reason, it was returning incorrect results and behaving unusual (I'm guessing the collection object has a limit on how many things you can put in it. I searched and I couldn't find any literature on this).

    Function UltimateTest(Lim As Long, bTestDirect As Boolean, bTestDictionary, bytCase As Byte) As Variant
    
        Dim dictionTest, collectTest, sortingTest1, indexTest1, directT '' all variants
        Dim arrTest() As Long, i As Long, bEquality As Boolean, SizeUnique As Long
        Dim myArray() As Long, StrtTime As Double, EndTime1 As Variant
        Dim EndTime2 As Double, EndTime3 As Variant, EndTime4 As Double
        Dim EndTime5 As Double, EndTime6 As Double, sortingTest2, indexTest2
    
        ReDim myArray(1 To Lim): Rnd (-2)   '' If you want to test negative numbers, 
        '' insert this to the left of CLng(Int(Lim... : (-1) ^ (Int(2 * Rnd())) *
        For i = LBound(myArray) To UBound(myArray): myArray(i) = CLng(Int(Lim * Rnd() + 1)): Next i
        arrTest = myArray
    
        If bytCase = 1 Then
            If bTestDictionary Then
                StrtTime = Timer: dictionTest = DictionaryTest(arrTest, Lim): EndTime1 = Timer - StrtTime
            Else
                EndTime1 = "Not Tested"
            End If
    
            arrTest = myArray
            collectTest = CollectionTest(arrTest, Lim)
    
            arrTest = myArray
            StrtTime = Timer: sortingTest1 = SortingUniqueTest(arrTest, True): EndTime2 = Timer - StrtTime
            SizeUnique = UBound(sortingTest1, 2)
    
            If bTestDirect Then
                arrTest = myArray: StrtTime = Timer: directT = DirectTest(arrTest, Lim): EndTime3 = Timer - StrtTime
            Else
                EndTime3 = "Not Tested"
            End If
    
            arrTest = myArray
            StrtTime = Timer: indexTest1 = IndexSort(arrTest, True): EndTime4 = Timer - StrtTime
    
            arrTest = myArray
            StrtTime = Timer: sortingTest2 = SortingUniqueTest(arrTest, False): EndTime5 = Timer - StrtTime
    
            arrTest = myArray
            StrtTime = Timer: indexTest2 = IndexSort(arrTest, False): EndTime6 = Timer - StrtTime
    
            bEquality = True
            For i = LBound(sortingTest1, 2) To UBound(sortingTest1, 2)
                If Not CLng(collectTest(0)(i)) = sortingTest1(1, i) Then
                    bEquality = False
                    Exit For
                End If
            Next i
    
            For i = LBound(dictionTest) To UBound(dictionTest)
                If Not dictionTest(i) = sortingTest1(1, i + 1) Then
                    bEquality = False
                    Exit For
                End If
            Next i
    
            For i = LBound(dictionTest) To UBound(dictionTest)
                If Not dictionTest(i) = indexTest1(1, i + 1) Then
                    bEquality = False
                    Exit For
                End If
            Next i
    
            If bTestDirect Then
                For i = LBound(dictionTest) To UBound(dictionTest)
                    If Not dictionTest(i) = directT(i + 1) Then
                        bEquality = False
                        Exit For
                    End If
                Next i
            End If
    
            UltimateTest = Array(bEquality, EndTime1, EndTime2, EndTime3, EndTime4, _
                            EndTime5, EndTime6, collectTest(1), collectTest(2), SizeUnique)
        ElseIf bytCase = 2 Then
            arrTest = myArray
            collectTest = CollectionTest(arrTest, Lim)
            UltimateTest = Array(collectTest(1), collectTest(2))
        ElseIf bytCase = 3 Then
            arrTest = myArray
            StrtTime = Timer: sortingTest1 = SortingUniqueTest(arrTest, True): EndTime2 = Timer - StrtTime
            SizeUnique = UBound(sortingTest1, 2)
            UltimateTest = Array(EndTime2, SizeUnique)
        ElseIf bytCase = 4 Then
            arrTest = myArray
            StrtTime = Timer: indexTest1 = IndexSort(arrTest, True): EndTime4 = Timer - StrtTime
            UltimateTest = EndTime4
        ElseIf bytCase = 5 Then
            arrTest = myArray
            StrtTime = Timer: sortingTest2 = SortingUniqueTest(arrTest, False): EndTime5 = Timer - StrtTime
            UltimateTest = EndTime5
        ElseIf bytCase = 6 Then
            arrTest = myArray
            StrtTime = Timer: indexTest2 = IndexSort(arrTest, False): EndTime6 = Timer - StrtTime
            UltimateTest = EndTime6
        End If
    
    End Function
    

    And finally, here is the sub that produces the table above.

    Sub GetBenchmarks()
        Dim myVar, i As Long, TestCases As Variant, j As Long, temp
    
        TestCases = Array(1000, 5000, 10000, 20000, 50000, 100000, 200000, 500000, 1000000, 2000000, 5000000, 10000000)
    
        For j = 0 To 11
            If j < 6 Then
                myVar = UltimateTest(CLng(TestCases(j)), True, True, 1)
            ElseIf j < 10 Then
                myVar = UltimateTest(CLng(TestCases(j)), False, True, 1)
            ElseIf j < 11 Then
                myVar = Array("Not Tested", "Not Tested", 0.1, "Not Tested", 0.1, 0.1, 0.1, 0, 0, 0)
                temp = UltimateTest(CLng(TestCases(j)), False, False, 2)
                myVar(7) = temp(0): myVar(8) = temp(1)
                temp = UltimateTest(CLng(TestCases(j)), False, False, 3)
                myVar(2) = temp(0): myVar(9) = temp(1)
                myVar(4) = UltimateTest(CLng(TestCases(j)), False, False, 4)
                myVar(5) = UltimateTest(CLng(TestCases(j)), False, False, 5)
                myVar(6) = UltimateTest(CLng(TestCases(j)), False, False, 6)
            Else
                myVar = Array("Not Tested", "Not Tested", 0.1, "Not Tested", 0.1, 0.1, 0.1, "Not Tested", "Not Tested", 0)
                temp = UltimateTest(CLng(TestCases(j)), False, False, 3)
                myVar(2) = temp(0): myVar(9) = temp(1)
                myVar(4) = UltimateTest(CLng(TestCases(j)), False, False, 4)
                myVar(5) = UltimateTest(CLng(TestCases(j)), False, False, 5)
                myVar(6) = UltimateTest(CLng(TestCases(j)), False, False, 6)
            End If
    
            Cells(4 + j, 6) = TestCases(j)
            For i = 1 To 9: Cells(4 + j, 6 + i) = myVar(i - 1): Next i
            Cells(4 + j, 17) = myVar(9)
        Next j
    End Sub
    

    Summary
    From the table of results, we can see that the Dictionary method works really well for cases less than about 500,000, however, after that, the IndexMethod really starts to dominate. You will notice that when order doesn't matter and your data is made up of positive integers, there is no comparison to the IndexMethod algorithm (it returns the unique values from an array containing 10 million elements in less than 1 sec!!! Incredible!). Below I have a breakdown of which algorithm is preferred in various cases.

    Case 1
    Your Data contains integers (i.e. whole numbers, both positive and negative): IndexMethod

    Case 2
    Your Data contains non-integers (i.e. variant, double, string, etc.) with less than 200000 elements: Dictionary Method

    Case 3
    Your Data contains non-integers (i.e. variant, double, string, etc.) with more than 200000 elements: Collection Method

    If you had to choose one algorithm, in my opinion, the Collection method is still the best as it only requires a few lines of code, it's super general, and it's fast enough.

    0 讨论(0)
  • 2020-11-22 16:00

    If the order of the deduplicated array does not matter to you, you can use my pragmatic function:

    Function DeDupArray(ia() As String)
      Dim newa() As String
      ReDim newa(999)
      ni = -1
      For n = LBound(ia) To UBound(ia)
        dup = False
        If n <= UBound(ia) Then
          For k = n + 1 To UBound(ia)
            If ia(k) = ia(n) Then dup = True
          Next k
    
          If dup = False And Trim(ia(n)) <> "" Then
            ni = ni + 1
            newa(ni) = ia(n)
          End If
        End If
      Next n
    
      If ni > -1 Then
        ReDim Preserve newa(ni)
      Else
        ReDim Preserve newa(1)
      End If
    
      DeDupArray = newa
    End Function
    
    
    
    Sub testdedup()
    Dim m(5) As String
    Dim m2() As String
    
    m(0) = "Horse"
    m(1) = "Cow"
    m(2) = "Dear"
    m(3) = "Horse"
    m(4) = "Joke"
    m(5) = "Cow"
    
    m2 = DeDupArray(m)
    t = ""
    For n = LBound(m2) To UBound(m2)
      t = t & n & "=" & m2(n) & " "
    Next n
    MsgBox t
    End Sub
    

    From the test function, it will result in the following deduplicated array:

    "0=Dear 1=Horse 2=Joke 3=Cow "

    0 讨论(0)
提交回复
热议问题