Index from accumarray with max/min

前端 未结 2 872
甜味超标
甜味超标 2021-01-21 04:32

I have a vector and a cell array (with repeating strings) with the same size. The cell array defines the groups. I want to find min/max values in the vector for each group.

相关标签:
2条回答
  • 2021-01-21 05:16

    When faced with a similar problem*, I came up with this solution:

    • define the following function (in a .m file)

         function i=argmax(x)
         [~,i]=max(x);
         end
      
    • then you can find the max locations as

         gridx = accumarray(grnum,grnum,[],@(i)i(argmax(value(i))) );
      
    • and the max values as

         grvalue = value(gridx);
      

    (*if I understand your problem correctly)

    0 讨论(0)
  • 2021-01-21 05:35

    The best vectorized answer I can see is:

    gridx = arrayfun(@(grix)find((grnum(:)==grix) & (value(:)==grvalue(grix)),1),unique(grnum));
    

    but I cannot call this a "fast" vectorized solution. arrayfun is really useful, but generally no faster than a loop.


    However, the fastest answer is not always vectorized. If I re-implement the code as you wrote it, but with a larger data set:

    nValues = 1000000;
    value = floor(rand(nValues,1)*100000);
    group = num2cell(char(floor(rand(nValues,1)*4)+'a'));
    tic;
    [grnum, grname] = grp2idx(group);
    grvalue = accumarray(grnum,value,[],@max);
    toc;
    

    My computer gives me a tic/toc time of 0.886 seconds. (Note, all tic/tock times are from the second run of a function defined in a file, to avoid one-time pcode generation.)

    Adding the "vectorized" (really arrayfun) one line gridx computation leads to a tic/tock time of 0.975 seconds. Not bad, additional investigation shows that most of the time is being consumed in the grp2idx call.

    If we reimplement this as a non-vectorized, simple loop, including the gridx computation, like this:

    tic
    [grnum, grname] = grp2idx(group);
    grvalue = -inf*ones(size(grname));
    gridx = zeros(size(grname));
    for ixValue = 1:length(value)
        tmpGrIdx = grnum(ixValue);
        if value(ixValue) > grvalue(tmpGrIdx)
            grvalue(tmpGrIdx) = value(ixValue);
            gridx(tmpGrIdx) = ixValue;
        end
    end
    toc
    

    the tic/toc time is about 0.847 seconds, slightly faster than the original code.


    Taking this a bit further, most of the time appears to be lost in the cell-array memory access. For example:

    tic; groupValues = double(cell2mat(group')); toc  %Requires 0.754 seconds
    tic; dummy       =       (cell2mat(group')); toc  %Requires 0.718 seconds
    

    If you initially define your group names as a numeric array (for example, I'll use groupValues as I defined them above), the the times decrease quite a bit, even using the same code:

    groupValues = double(cell2mat(group'));  %I'm assuming this is precomputed
    tic
    [grnum, grname] = grp2idx(groupValues);
    grname = num2cell(char(str2double(grname))); %Recapturing your original names
    grvalue = -inf*ones(size(grname));
    gridx = zeros(size(grname));
    for ixValue = 1:length(value)
        tmpGrIdx = grnum(ixValue);
        if value(ixValue) > grvalue(tmpGrIdx)
            grvalue(tmpGrIdx) = value(ixValue);
            gridx(tmpGrIdx) = ixValue;
        end
    end
    toc
    

    This produces a tic/tock time of 0.16 seconds.

    0 讨论(0)
提交回复
热议问题