Using OR logic on an array as argument in Sumproduct

后端 未结 7 522
南旧
南旧 2021-01-12 04:47

I\'m having a fairly large dataset where I need to combine multiple entries into a single value. My dataset contains data on the combination of two datasets, each using thei

相关标签:
7条回答
  • 2021-01-12 05:07

    You could make a small change to your current formula; change the ; to * (-- are also unneeded in that particular case):

    =SUMPRODUCT(('Raw data'!C:C=Landgebruik!A2)*('Raw data'!O:O={20;21;22;23;40})*'Raw data'!S:S)
    

    And that should work.


    When you feed separate parameters to SUMPRODUCT, each parameter has to be of the same size. But when you multiply them like this, it forces evaluation and the arrays expand.

    For example, if you take two arrays, 5x1 and 1x5, you get a 5x5 resulting array:

    0 讨论(0)
  • 2021-01-12 05:10

    I want to give a shot to this question, after asking OP for some clarifications, because English is not my main language and I think I have misunderstood something.

    So, what I did to simulate situation, made a new workbook with 2 sheets.

    One sheet is named Landgebruik and got a value in A2 and I did this:

    The second sheet is named Raw data. I hide some columns to use only columns C, O and S. In column S I input just values equal to 1. In column O I did randomly values equal to {20,21,22,23,40} and in Column C I did randomly values which were A or B. And it looks like this (please, note I hide some columns):

    And question would like to sum values in column S but only If column O is equal to 20 or 21 or 22 or 23 o 40 and column C is equal to Landgebruik!A2 (in my test, value in there is letter A)

    We can use an array formula to filter the data in column S and then, once filtered, sum values that meet requirements. In my test, the correct result would be 8, beause only 8 values in column S meet requirements of column C and O. In the image, the right rows are highlighted in yellow.

    OP already did this, but wants to know if there is a shorter/elegant formula.

    Shortest formula I found is like this:

    =SUM(IF($O$2:$O$28={20;21;22;23;40};IF($C$2:$C$28=Landgebruik!$A$2;$S$2:$S$28)))
    

    This is an array formula, so it must be inserted pressing CTRL+SHIFT+ENTER or it won't work!

    HOW IT WORKS:

    First IF takes all values in column S and ignores all where equivalent in Column O are not 20 or 21 or 22 or 23 or 40. Second IF takes that new array, and ignores all values where equivalent in column C are not equal to Landgebruik!$A$2. Final array is sumed up by the function SUM

    I've tried to explain the best I can. I hope you can adapt this to your needs.

    0 讨论(0)
  • 2021-01-12 05:20

    If you are interested in performance (calculation speed) and are not afraid of matrix calculation, you can use MMULT:

    =SUMPRODUCT(--('Raw data'!C:C=Landgebruik!A2),MMULT(--('Raw data'!O:O={20,21,22,23,24}),TRANSPOSE({1,1,1,1,1})),'Raw data'!S:S)
    

    Explanation:

    First, you create a 1048576×5 matrix, where the value in the i-th row and j-th column is 1 if the ID in 'Raw data'!O:O's i-th line is the same as the j-th value in the enumeration {20,21,22,23,24}, 0 otherwise.

    Second, you multiply this by a vector of 1s (5 1s because {20,21,22,23,24} contains five elements), which means that you accept all the five values.

    Third, from the above you get a vector where the i-th element is 1 if the ID is among the accepted values, 0 otherwise, and you put this vector next to the others in your SUMPRODUCT.

    (Sorry, my Excel uses ',' instead of ';'. If you want to shorten the formula, you may write {1;1;1;1;1} instead of TRANSPOSE({1,1,1,1,1}). But you have to find out what your Excel uses instead of ';' to separate rows, most probably '.'.)

    Note: It may improve the speed of the calculation if you refer to the range which actualy contain values, not the whole column, e.g. 'Raw data'!C1:C123 instead of 'Raw data'!C:C.

    If you insert new rows with Shift+Space Ctrl++ above the last row already included, then the references in your formulas will be updated automatically. Alternatively, you may use Names with special formulas that grow the Range referred to by determining the last non-empty cell.

    Update

    I made some measurements to compare the efficiency of these approaches. I used random data of 10000 rows and I recalculated each formula 1000 times. You can see the elapsed time in the second column.

    I commented out the other formulas while I ran this VBA code to measure the time:

    Public Sub MeasureCalculationTime()
        Dim datStart As Date: datStart = Now
    
        Dim i As Long: For i = 1 To 1000
            Application.Calculate
        Next i
    
        Dim datFinish As Date: datFinish = Now
        Dim dblSeconds As Double: dblSeconds = (datFinish - datStart) * 24 * 60 * 60
        Debug.Print "Calculation finished at " & datFinish; " took " & dblSeconds & " seconds"
    End Sub
    

    In this scenario, MMULT was not the fastest.

    However, I would like to point out that it is the most flexible because

    1. You may use it with switches: You refer to a cell range instead of the {1,1,1,1,1}, and you will be able to include / exclude IDs in the selection very quickly. Like you put into A1:A5 {20,21,22,23,24} and next to it, into B1:B5 {1,1,1,1,1}. If you want to exclude 21, then you rewrite B2 to 0, if you want to include it, you write it back to 1.

    2. You may use more complicated criteria, where you have to compare multiple levels. Like:

      =SUMPRODUCT(MMULT(--(CarId=CarOwner),--(CarOwner=ListOfJobs),--(ListOfJobs=JobsByDepartment),--(DepartmentIncludedInSelection=1)),FuelConsumption)

    Note: The above line is just pseudocode, MMULT has only two parameters.

    0 讨论(0)
  • 2021-01-12 05:21

    You can use text search for this:

    --NOT(ISERROR(FIND('Raw data'!O:O,"2021222340")))
    

    But you have to be careful that a shorter ID is not found incorrectly in a longer ID, e.g. if you want to search among the IDs { 123, 456, 789 } then 12 is not considered to be among the IDs. So a simple text search like the above would not work. You need a delimiter character to break up the string of IDs. Usually I use the pipe character for this purpose, since I cannot remember any case when it occurred in the original text of an Excel file, and because it makes the formula human-readable:

    --NOT(ISERROR(FIND("|"&'Raw data'!O:O&"|","|20|21|22|23|40|")))
    

    Examples:

    'Raw data'!O:O is 20 => |21| is found in |20|21|22|23|40|

    'Raw data'!O:O is 2 => |2| is not found in |20|21|22|23|40|

    (If your IDs may include the pipe character, then you can use CHR(1), a long forgotten ASCII code for SOH meaning start of header; of course, it's less readable.)

    The whole formula:

    =SUMPRODUCT(--('Raw data'!C:C=Landgebruik!A2),--NOT(ISERROR(FIND("|"&'Raw data'!O:O&"|","|20|21|22|23|40|"))),'Raw data'!S:S)
    

    (Sorry, my Excel uses , instead of ;)

    0 讨论(0)
  • 2021-01-12 05:21

    This could work:

    ={SUMPRODUCT(--('Raw data'!C:C=Landgebruik!A2);--IFERROR(MATCH('Raw data'!O:O;{20;21;22;23;40};0)>0;0);'Raw data'!S:S)}
    

    This needs to be entered as an array formula.

    0 讨论(0)
  • 2021-01-12 05:23

    Even though this has been done hundreds of times before, hey maybe microsoft switched up the formulas or something.

    I am partial to the method Jerry and Me suggested as they are simple as hell and concise, but you pay a heavy performance cost.

    Tom's formula looks ugly to me but was fastest by far, about 4x faster than my initial example. We were able to incorporate the {}s with Tom's formula, but to get it to work we had to wrap the sumifs function with a sum function. This slowed down the formula considerably but made it prettier.

    z32a7ul had a great solution too. I really like the use of -- and learned how to use |s to search for a text and only that text. At first glance I thought that it would not work on a number such as 2323 but it does.

    Mock up example was as follows:

    A1:A5000 was filled with LandgeBruik,

    B1:B5000 was filled with 40's

    C1:5000 was filled with 1's.


    The results:

    =SUMPRODUCT((A1:A5000="LandgeBruik")*(B1:B5000={20,21,22,23,40})*C1:C5000)
    

    19.186031 seconds elapsed | 59,818,073 ticks

    {=SUM(IF(A1:A5000="Landgebruik",1,0)*IF(B1:B5000={20,21,22,23,40},1,0)*C1:C5000)}
    

    26.124411 seconds elapsed | 81,450,506 ticks

    {=SUM((A1:A5000=""Landgebruik"")*(B1:B5000={20,21,22,23,40})*C1:C5000)}
    

    21.111835 seconds elapsed | 65,822,330 ticks

    "=SUMIFS(C1:C5000,B1:B5000,"">=20"",B1:B5000,""<=23"",A1:A5000,""=Landgebruik"")+SUMIFS(C1:C5000,B1:B5000,""=40"",A1:A5000,""=Landgebruik"")"
    

    6.732804 seconds elapsed | 20,991,490 ticks

    "=SUM(SUMIFS(C1:C5000,A1:A5000,"Landgebruik",B1:B5000,{21,22,23,24,40}))"
    

    16.954528 seconds elapsed | 52,860,709 ticks

    "=SUMPRODUCT(--(A1:A5000=""Landgebruik""),--NOT(ISERROR(FIND(""|""&B1:B5000&""|"",""|20|21|22|23|40|""))),C1:C5000)"
    

    11.822379 seconds elapsed | 36,859,729 ticks

    0 讨论(0)
提交回复
热议问题