Finding (multiset) difference between two arrays

后端 未结 5 529
星月不相逢
星月不相逢 2021-01-19 02:50

Given arrays (say row vectors) A and B, how do I find an array C such that merging B and C will give A?

For example, given

A = [2, 4, 6, 4, 3, 3, 1         


        
相关标签:
5条回答
  • 2021-01-19 03:10

    Still another approach using the histc function:

    A = [2, 4, 6, 4, 3, 3, 1, 5, 5, 5];
    B = [2, 3, 5, 5];
    
    uA  = unique(A);
    hca = histc(A,uA); 
    hcb = histc(B,uA);
    res = repelem(uA,hca-hcb)
    

    We simply calculate the number of repeated elements for each vectors according to the unique value of vector A, then we use repelem to create the result.

    This solution do not preserve the initial order but it don't seems to be a problem for you.

    I use histc for Octave compatibility, but this function is deprecated so you can also use histcounts

    0 讨论(0)
  • 2021-01-19 03:14

    Strongly inspired by Matt, but on my machine 40% faster:

    function A = multiDiff(A,B)
    for j = 1:numel(B)
        for i = 1:numel(A)
            if A(i) == B(j)
                A(i) = [];
                break;
            end
        end
    end
    end
    
    0 讨论(0)
  • 2021-01-19 03:16

    Here's a vectorized way. Memory-inefficient, mostly for fun:

    tA = sum(triu(bsxfun(@eq, A, A.')), 1);
    tB = sum(triu(bsxfun(@eq, B, B.')), 1);
    result = setdiff([A; tA].', [B; tB].', 'rows', 'stable');
    result = result(:,1).';
    

    The idea is to make each entry unique by tagging it with an occurrence number. The vectors become 2-column matrices, setdiff is applied with the 'rows' option, and then the tags are removed from the result.

    0 讨论(0)
  • 2021-01-19 03:16

    You can use the second output of ismember to find the indexes where elements of B are in A, and diff to remove duplicates:

    This answer assumes that B is already sorted. If that is not the case, B has to be sorted before executing above solution.

    For the first example:

    A = [2, 4, 6, 4, 3, 3, 1, 5, 5, 5];
    B = [2, 3, 5, 5];
    %B = sort(B); Sort if B is not sorted.
    [~,col] = ismember(B,A);
    indx = find(diff(col)==0);
    col(indx+1) = col(indx)+1;
    A(col) = [];
    C = A;
    
    >>C
    
    4     6     4     3     1     5
    

    For the second example:

    A = [2, 4, 6, 4, 3, 3, 1, 5, 5, 5];
    B = [2, 4, 5, 5];
    %B = sort(B); Sort if B is not sorted.
    [~,col] = ismember(B,A);
    indx = find(diff(col)==0);
    col(indx+1) = col(indx)+1;
    A(col) = [];
    C = A;
    >>C
    
    6     4     3     3     1     5
    
    0 讨论(0)
  • 2021-01-19 03:26

    I'm not a fan of loops, but for random perturbations of A this was the best I came up with.

    C = A;
    for x = 1:numel(B)
    C(find(C == B(x), 1, 'first')) = [];
    end
    

    I was curious about looking at the affect of different orders of A on a solution approach so I setup a test like this:

    Ctruth = [1 3 3 4 5 5 6];
    for testNumber = 1:100
        Atest = A(randperm(numel(A)));
        C = myFunction(Atest,B);
        C = sort(C);
        assert(all(C==Ctruth));
    end
    
    0 讨论(0)
提交回复
热议问题