I have similar question to [combine Rows with Duplicate Values][1] Excel VBA - Combine rows with duplicate values in one cell and merge values in other cell
I have
As I wrote above, I would iterate through the data and collect things into the User Defined Object. There is no need for the data to be sorted in this method; and duplicate REF
's will be omitted.
One advantage of a User Defined Object is that it makes debugging easier as you can see more clearly what you have done.
We combine every line where ID
and CH
are the same, by using the property of the Collection object to raise an error if identical keys are used.
So far as combining the Ref's in a single cell with a delimiter, vs individual cells in columns D:K, either can be done simply. I chose to separate into columns, but changing it to combine into a single column would be trivial.
After Inserting the Class Module, you must rename it: cID_CH
You will note I placed the results on a separate worksheets. You could overwrite the original data, but I would advise against that.
Option Explicit
Private pID As Long
Private pCH As Long
Private pPUB As String
Private pREF As String
Private pcolREF As Collection
Public Property Get ID() As Long
ID = pID
End Property
Public Property Let ID(Value As Long)
pID = Value
End Property
Public Property Get CH() As Long
CH = pCH
End Property
Public Property Let CH(Value As Long)
pCH = Value
End Property
Public Property Get PUB() As String
PUB = pPUB
End Property
Public Property Let PUB(Value As String)
pPUB = Value
End Property
Public Property Get REF() As String
REF = pREF
End Property
Public Property Let REF(Value As String)
pREF = Value
End Property
Public Property Get colREF() As Collection
Set colREF = pcolREF
End Property
Public Sub ADD(refVAL As String)
On Error Resume Next
pcolREF.ADD refVAL, refVAL
On Error GoTo 0
End Sub
Private Sub Class_Initialize()
Set pcolREF = New Collection
End Sub
Option Explicit
Sub CombineDUPS()
Dim wsSRC As Worksheet, wsRES As Worksheet
Dim vSRC As Variant, vRES() As Variant, rRES As Range
Dim cI As cID_CH, colI As Collection
Dim I As Long, J As Long
Dim S As String
'Set source and results worksheets and results range
Set wsSRC = Worksheets("sheet1")
Set wsRES = Worksheets("sheet2")
Set rRES = wsRES.Cells(1, 1)
'Get Source data
With wsSRC
vSRC = .Range("A2", .Cells(.Rows.Count, "D").End(xlUp))
End With
'Collect and combine data
Set colI = New Collection
On Error Resume Next
For I = 1 To UBound(vSRC, 1)
Set cI = New cID_CH
With cI
.PUB = vSRC(I, 1)
.ID = vSRC(I, 2)
.CH = vSRC(I, 3)
.REF = vSRC(I, 4)
.ADD .REF
S = CStr(.ID & "|" & .CH)
colI.ADD cI, S
If Err.Number = 457 Then
Err.Clear
colI(S).ADD .REF
ElseIf Err.Number <> 0 Then
Debug.Print Err.Number, Err.Description
Stop
End If
End With
Next I
On Error GoTo 0
'Create and populate Results Array
ReDim vRES(0 To colI.Count, 1 To 11)
'Header row
vRES(0, 1) = "Pub"
vRES(0, 2) = "ID"
vRES(0, 3) = "CH"
vRES(0, 4) = "Ref"
'populate array
For I = 1 To colI.Count
With colI(I)
vRES(I, 1) = .PUB
vRES(I, 2) = .ID
vRES(I, 3) = .CH
For J = 1 To .colREF.Count
vRES(I, J + 3) = .colREF(J)
Next J
End With
Next I
'Write the results to the worksheet
Set rRES = rRES.Resize(UBound(vRES, 1) + 1, UBound(vRES, 2))
With rRES
.EntireColumn.Clear
.Value = vRES
With .Rows(1)
.Font.Bold = True
.HorizontalAlignment = xlCenter
Range(.Cells(4), .Cells(11)).HorizontalAlignment = xlCenterAcrossSelection
End With
.EntireColumn.AutoFit
End With
End Sub
Original
Processed Results
variant using dictionary below
Sub test()
Dim Dic As Object: Set Dic = CreateObject("Scripting.Dictionary")
Dic.Comparemode = vbTextCompare
Dim Cl As Range, x$, y$, i&, Key As Variant
For Each Cl In Range("B2:B" & Cells(Rows.Count, "B").End(xlUp).Row)
x = Cl.Value & "|" & Cl.Offset(, 1).Value
y = Cl.Offset(, 2).Value
If Not Dic.exists(x) Then
Dic.Add x, Cl.Offset(, -1).Value & "|" & y & "|"
ElseIf Dic.exists(x) And Not LCase(Dic(x)) Like "*|" & LCase(y) & "|*" Then
Dic(x) = Dic(x) & "|" & y & "|"
End If
Next Cl
Range("A2:D" & Cells(Rows.Count, "B").End(xlUp).Row).ClearContents
i = 2
For Each Key In Dic
Cells(i, "A") = Split(Dic(Key), "|")(0)
Range(Cells(i, "B"), Cells(i, "C")) = Split(Key, "|")
Cells(i, "D") = Replace(Split(Replace(Dic(Key), "||", ";"), "|")(1), ":", ";")
i = i + 1
Next Key
Set Dic = Nothing
End Sub
before
after
The canonical practise for deleting rows is to start at the bottom and work toward the top. In this manner, rows are not skipped. The trick here is to find rows above the current position that match columns B and C and concatenate the strings from column D before removing the row. There are several good worksheet formulas that can acquire the row number of a two-column-match. Putting one of them into practise with application.Evaluate
would seem to be the most expedient method of collecting the values from column D.
Sub dedupe_and_collect()
Dim rw As Long, mr As Long, wsn As String
With ActiveSheet '<- set this worksheet reference properly!
wsn = .Name
With .Cells(1, 1).CurrentRegion
.RemoveDuplicates Columns:=Array(2, 3, 4), Header:=xlYes
End With
With .Cells(1, 1).CurrentRegion 'redefinition after duplicate removal
For rw = .Rows.Count To 2 Step -1 'walk backwards when deleting rows
If Application.CountIfs(.Columns(2), .Cells(rw, 2).Value, .Columns(3), .Cells(rw, 3).Value) > 1 Then
mr = Application.Evaluate("MIN(INDEX(ROW(1:" & rw & ")+(('" & wsn & "'!B1:B" & rw & "<>'" & wsn & "'!B" & rw & ")+('" & wsn & "'!C1:C" & rw & "<>'" & wsn & "'!C" & rw & "))*1E+99, , ))")
'concatenate column D
'.Cells(mr, 4) = .Cells(mr, 4).Value & "; " & .Cells(rw, 4).Value
'next free column from column D
.Cells(mr, Columns.Count).End(xlToLeft).Offset(0, 1) = .Cells(rw, 4).Value
.Rows(rw).EntireRow.Delete
End If
Next rw
End With
End With
End Sub
The removal of records on a three-column-match is done with the VBA equivalent of the Date ► Data Tools ► Remove Duplicates command. This only considers columns B, C and D and deletes the lower duplicates (keeping the ones closest to row 1). If Column A is important in this respect, additional coding would have to be added.
It's unclear to me whether you wanted column D as delimited string or separate cells as an end result. Could you clarify?