问题
I am having trouble determining a way to enter a 1
or 0
into an adjacent cell to indicate whether or not a value is unique when working with a large dataset. I have read of multiple methods for accomplishing this, however none of them seem efficient for my purposes: I am using an instance of Excel 2010 (so I do not have the Distinct Count feature in PivotTables, and when I try to use PowerPivot it crashes my file due to processing limitations.
In this StackOverflow question: Simple Pivot Table to Count Unique Values there are suggestions to use SUMPRODUCT
or COUNTIF
, but when working with 50,000+ rows as I am, this causes terrible performance and a file size of ~35 MB instead of ~3 MB. I wanted to know if there is a better solution for a large, dynamic dataset whether it is a formula or VBA.
An example of what I would like to accomplish is (with the Unique column being the adjacent cell):
Name Week Unique
John 1 1
Sally 1 1
John 1 0
Sally 2 1
I attempted to script the same functionality of COUNTIF
but with no success:
For Each Cell In ThisWorkbook.Worksheets("Overtime & Type Data").Range("Z2:Z" & DataLastRow)
If Worksheets("Overtime & Type Data").Cells(Cell.Row, 26) <> Worksheets("Overtime & Type Data").Cells(Cell.Row - 1, 26) Then
FirstCell = Cell.Row
End If
If (Worksheets("Overtime & Type Data").Range(Cells(FirstCell, 26), Cells(Cell.Row, 26)) = Worksheets("Overtime & Type Data").Range(Cells(Cell.Row, 26))) = True Then
Cell.Value = 1
Else
Cell.Value = 0
End If
Next Cell
回答1:
This code ran on over 130,000 rows successfully in less than 3 seconds. Adjust the column letters to fit your dataset.
Sub tgr()
Const colName As String = "A"
Const colWeek As String = "B"
Const colOutput As String = "C"
Dim ws As Worksheet
Dim rngData As Range
Dim DataCell As Range
Dim rngFound As Range
Dim collUniques As Collection
Dim arrResults() As Long
Dim ResultIndex As Long
Dim UnqCount As Long
Set ws = ThisWorkbook.Sheets("Overtime & Type Data")
Set rngData = ws.Range(colName & 2, ws.Cells(Rows.Count, colName).End(xlUp))
Set collUniques = New Collection
ReDim arrResults(1 To rngData.Cells.Count, 1 To 1)
On Error Resume Next
For Each DataCell In rngData.Cells
ResultIndex = ResultIndex + 1
collUniques.Add ws.Cells(DataCell.Row, colName).Value & ws.Cells(DataCell.Row, colWeek).Value, ws.Cells(DataCell.Row, colName).Value & ws.Cells(DataCell.Row, colWeek).Value
If collUniques.Count > UnqCount Then
UnqCount = collUniques.Count
arrResults(ResultIndex, 1) = 1
Else
arrResults(ResultIndex, 1) = 0
End If
Next DataCell
On Error GoTo 0
ws.Cells(rngData.Row, colOutput).Resize(rngData.Cells.Count).Value = arrResults
End Sub
回答2:
One approach is to sort by Name and Week. Then you can determine Unique for any row by comparing with the previous row.
If you need to preserve the order, you could first write a column of Index numbers (1, 2, 3, ...) to keep track of order. After calculating Unique, sort by Index to restore the original order.
The whole process could be done manually with relatively few steps, or automated with VBA.
回答3:
I'm not sure how well this will work with 50000 values, but it goes through ~1500 in about a second.
Sub unique()
Dim myColl As New Collection
Dim isDup As Boolean
Dim myValue As String
Dim r As Long
On Error GoTo DuplicateValue
For r = 1 To Sheet1.UsedRange.Rows.Count
isDup = False
'Combine the value of the 2 cells together
' and add that string to our collection
'If it is already in the collection it errors
myValue = Sheet1.Cells(r, 1).Value & Sheet1.Cells(r, 2).Value
myColl.Add r, myValue
If isDup Then
Sheet1.Cells(r, 3).Value = "0"
Else
Sheet1.Cells(r, 3).Value = "1"
End If
Next
On Error GoTo 0
Exit Sub
DuplicateValue:
'The value is already in the collection so put a 0
isDup = True
Resume Next
End Sub
回答4:
Just about any bulk operation will beat a loop involving worksheet cells. You might be able to trim the time down a bit by performing all of the calculations in memory and only returning the values back to the worksheet en masse when it is complete.
Sub is_a_dupe()
Dim v As Long, vTMP As Variant, vUNQs As Variant, dUNQs As Object
Debug.Print Timer
On Error GoTo bm_Uh_Oh
Set dUNQs = CreateObject("Scripting.Dictionary")
With Worksheets("Sheet1")
vTMP = .Range(.Cells(2, 1), .Cells(Rows.Count, 2).End(xlUp)).Value2
ReDim vUNQs(1 To UBound(vTMP, 1), 1 To 1)
For v = LBound(vTMP, 1) To UBound(vTMP, 1)
If dUNQs.Exists(Join(Array(vTMP(v, 1), vTMP(v, 2)))) Then
vUNQs(v, 1) = 0
Else
dUNQs.Add Key:=Join(Array(vTMP(v, 1), vTMP(v, 2))), _
Item:=vTMP(v, 2)
vUNQs(v, 1) = 1
End If
Next v
.Cells(2, 3).Resize(UBound(vUNQs, 1), 1) = vUNQs
End With
Debug.Print Timer
bm_Uh_Oh:
dUNQs.RemoveAll
Set dUNQs = Nothing
End Sub
Previous experience tells me that the variety of data (as well as hardware, etc) will impact timing the process but in my random sample data I received these elapsed times.
50K records ..... 0.53 seconds
130K records .... 1.32 seconds
500K records .... 4.92 seconds
来源:https://stackoverflow.com/questions/32125625/unique-count-formula-for-large-dataset