Using OR logic on an array as argument in Sumproduct

后端 未结 7 518
南旧
南旧 2021-01-12 04:47

I\'m having a fairly large dataset where I need to combine multiple entries into a single value. My dataset contains data on the combination of two datasets, each using thei

7条回答
  •  说谎
    说谎 (楼主)
    2021-01-12 05:20

    If you are interested in performance (calculation speed) and are not afraid of matrix calculation, you can use MMULT:

    =SUMPRODUCT(--('Raw data'!C:C=Landgebruik!A2),MMULT(--('Raw data'!O:O={20,21,22,23,24}),TRANSPOSE({1,1,1,1,1})),'Raw data'!S:S)
    

    Explanation:

    First, you create a 1048576×5 matrix, where the value in the i-th row and j-th column is 1 if the ID in 'Raw data'!O:O's i-th line is the same as the j-th value in the enumeration {20,21,22,23,24}, 0 otherwise.

    Second, you multiply this by a vector of 1s (5 1s because {20,21,22,23,24} contains five elements), which means that you accept all the five values.

    Third, from the above you get a vector where the i-th element is 1 if the ID is among the accepted values, 0 otherwise, and you put this vector next to the others in your SUMPRODUCT.

    (Sorry, my Excel uses ',' instead of ';'. If you want to shorten the formula, you may write {1;1;1;1;1} instead of TRANSPOSE({1,1,1,1,1}). But you have to find out what your Excel uses instead of ';' to separate rows, most probably '.'.)

    Note: It may improve the speed of the calculation if you refer to the range which actualy contain values, not the whole column, e.g. 'Raw data'!C1:C123 instead of 'Raw data'!C:C.

    If you insert new rows with Shift+Space Ctrl++ above the last row already included, then the references in your formulas will be updated automatically. Alternatively, you may use Names with special formulas that grow the Range referred to by determining the last non-empty cell.

    Update

    I made some measurements to compare the efficiency of these approaches. I used random data of 10000 rows and I recalculated each formula 1000 times. You can see the elapsed time in the second column.

    I commented out the other formulas while I ran this VBA code to measure the time:

    Public Sub MeasureCalculationTime()
        Dim datStart As Date: datStart = Now
    
        Dim i As Long: For i = 1 To 1000
            Application.Calculate
        Next i
    
        Dim datFinish As Date: datFinish = Now
        Dim dblSeconds As Double: dblSeconds = (datFinish - datStart) * 24 * 60 * 60
        Debug.Print "Calculation finished at " & datFinish; " took " & dblSeconds & " seconds"
    End Sub
    

    In this scenario, MMULT was not the fastest.

    However, I would like to point out that it is the most flexible because

    1. You may use it with switches: You refer to a cell range instead of the {1,1,1,1,1}, and you will be able to include / exclude IDs in the selection very quickly. Like you put into A1:A5 {20,21,22,23,24} and next to it, into B1:B5 {1,1,1,1,1}. If you want to exclude 21, then you rewrite B2 to 0, if you want to include it, you write it back to 1.

    2. You may use more complicated criteria, where you have to compare multiple levels. Like:

      =SUMPRODUCT(MMULT(--(CarId=CarOwner),--(CarOwner=ListOfJobs),--(ListOfJobs=JobsByDepartment),--(DepartmentIncludedInSelection=1)),FuelConsumption)

    Note: The above line is just pseudocode, MMULT has only two parameters.

提交回复
热议问题