I have a problem that I haven\'t been able to find anywhere on the web (it may be there, but I can\'t find it, heh).
I have a spreadsheet with 13 columns of data. E
Since I offered an ODBC approach I thought I should elaborate on it, as it is not immediately obvious how to do this. And, in honesty, I needed to relearn the process and document it for myself.
This is a way to generate a Cartesian product of two or more one-dimensional data arrays using Excel and Microsoft Query.
These instructions were written with XL2007 but should work with minor (if any) modifications in any version.
Organize the arrays in columns.
Important: Each column should have two "header" names as shown in bold below. The topmost name will later be interpreted as a "table name". The second name will be interpreted as a "column name". This will become apparent a few steps later.
Select each data range in turn, including both "headers", and hit Ctrl+Shift+F3
. Tick only Top row
in the 'Create Names" dialog and click OK
.
Once all named ranges are established, save the file.
Data | Get External Data | From Other Sources | From Microsoft Query
Choose <New Data Source>
. In the Choose New Data Source
dialog:
A friendly name for your connection
choose the appropriate Microsoft Excel driver
... then Connect
Select Workbook...
then browse for your file.
Add the "columns" from your "tables". You can see now why the "two header" layout in step 1 is important--it tricks the driver into understanding the data correctly.
Next click Cancel
(really!). You might be prompted at this point to "continue editing in Microsoft Query?" (answer Yes
), or a complaint that joins cannot be represented in the graphical editor. Ignore this and forge on...
Microsoft Query opens, and by default the tables you added will be cross-joined. This will generate a Cartesian product, which is what we want.
Now close MSQuery altogether.
You are returned to the worksheet. Almost done, I promise! Tick New worksheet
and OK
.
The cross-joined results are returned.
Solution based on my second comment. This example assumes you have three columns of data but can be adapted to handle more.
I start with your sample data. I added counts on the top row for convenience. I also added the total number of combinations (product of the counts). This is Sheet1
:
On Sheet2
:
Formulae:
A2:C2
(orange cells) are hard coded =0
A3=IF(SUM(B3:C3)=0,MOD(A2+1,Sheet1!$E$1),A2)
B3=IF(C3=0,MOD(B2+1,Sheet1!$G$1),B2)
C3=MOD(C2+1,Sheet1!$J$1)
D2=INDEX(Sheet1!$E$2:$E$5,Sheet2!A2+1)
E2=INDEX(Sheet1!$G$2:$G$6,Sheet2!B2+1)
F2=INDEX(Sheet1!$J$2:$J$5,Sheet2!C2+1)
Fill from row 3 down as many rows as Total
shows on Sheet1
I needed this myself several times and finally built it.
I believe the code scales for any total number of columns and any number of distinct values within columns (e.g. each column can contain any number of values)
It assumes all values in each column are unique (if this is not true, you will get duplicate rows)
It assumes you want to cross-join output based on whatever cells you have currently selected (make sure you select them all)
It assumes you want the output to start one column after the current selection.
How it works (briefly): first for each column and for each row: It calculates the number of total rows needed to support all combos in N columns (items in column 1 * items in column 2 ... * items in column N)
second for each column: Based on the total combos, and the total combos of the previous columns it calculates two loops.
ValueCycles (how many times you have to cycle through all the values in the current column) ValueRepeats (how many times to repeat each value in the column consecutively)
Sub sub_CrossJoin()
Dim rg_Selection As Range
Dim rg_Col As Range
Dim rg_Row As Range
Dim rg_Cell As Range
Dim rg_DestinationCol As Range
Dim rg_DestinationCell As Range
Dim int_PriorCombos As Long
Dim int_TotalCombos As Long
Dim int_ValueRowCount As Long
Dim int_ValueRepeats As Long
Dim int_ValueRepeater As Long
Dim int_ValueCycles As Long
Dim int_ValueCycler As Long
int_TotalCombos = 1
int_PriorCombos = 1
int_ValueRowCount = 0
int_ValueCycler = 0
int_ValueRepeater = 0
Set rg_Selection = Selection
Set rg_DestinationCol = rg_Selection.Cells(1, 1)
Set rg_DestinationCol = rg_DestinationCol.Offset(0, rg_Selection.Columns.Count)
'get total combos
For Each rg_Col In rg_Selection.Columns
int_ValueRowCount = 0
For Each rg_Row In rg_Col.Cells
If rg_Row.Value = "" Then
Exit For
End If
int_ValueRowCount = int_ValueRowCount + 1
Next rg_Row
int_TotalCombos = int_TotalCombos * int_ValueRowCount
Next rg_Col
int_ValueRowCount = 0
'for each column, calculate the repeats needed for each row value and then populate the destination
For Each rg_Col In rg_Selection.Columns
int_ValueRowCount = 0
For Each rg_Row In rg_Col.Cells
If rg_Row.Value = "" Then
Exit For
End If
int_ValueRowCount = int_ValueRowCount + 1
Next rg_Row
int_PriorCombos = int_PriorCombos * int_ValueRowCount
int_ValueRepeats = int_TotalCombos / int_PriorCombos
int_ValueCycles = (int_TotalCombos / int_ValueRepeats) / int_ValueRowCount
int_ValueCycler = 0
int_ValueRepeater = 0
Set rg_DestinationCell = rg_DestinationCol
For int_ValueCycler = 1 To int_ValueCycles
For Each rg_Row In rg_Col.Cells
If rg_Row.Value = "" Then
Exit For
End If
For int_ValueRepeater = 1 To int_ValueRepeats
rg_DestinationCell.Value = rg_Row.Value
Set rg_DestinationCell = rg_DestinationCell.Offset(1, 0)
Next int_ValueRepeater
Next rg_Row
Next int_ValueCycler
Set rg_DestinationCol = rg_DestinationCol.Offset(0, 1)
Next rg_Col
End Sub
Not sure why you are averse to looping. See this example. It took less than a second.
Option Explicit
Sub Sample()
Dim i As Long, j As Long, k As Long, l As Long
Dim CountComb As Long, lastrow As Long
Range("G2").Value = Now
Application.ScreenUpdating = False
CountComb = 0: lastrow = 6
For i = 1 To 4: For j = 1 To 4
For k = 1 To 8: For l = 1 To 12
Range("G" & lastrow).Value = Range("A" & i).Value & "/" & _
Range("B" & j).Value & "/" & _
Range("C" & k).Value & "/" & _
Range("D" & l).Value
lastrow = lastrow + 1
CountComb = CountComb + 1
Next: Next
Next: Next
Range("G1").Value = CountComb
Range("G3").Value = Now
Application.ScreenUpdating = True
End Sub
SNAPSHOT
NOTE: The above was a small example. I did a test on 4 columns with with 200 rows each. The total combination possible in such a scenario is 1600000000
and it took 16 seconds.
In such a case it crosses the Excel rows limit. One other option that I can think of is writing the output to a text file in such a scenario. If your data is small then you can get away without using arrays and directly writing to the cells. :) But in case of large data, I would recommend using arrays.
call the method and put into the current level, which will be decremented in the method (sorry for eng)
sample:
sub MyAdd(i as integer)
if i > 1 then
MyAdd = i + MyAdd(i-1)
else
MyAdd = 1
end if
end sub