Generate a matrix containing all combinations of elements taken from n vectors

后端 未结 4 1481
孤街浪徒
孤街浪徒 2020-11-21 11:52

This question pops up quite often in one form or another (see for example here or here). So I thought I\'d present it in a general form, and provide an answer which might se

相关标签:
4条回答
  • 2020-11-21 12:10

    I've done some benchmarking on the two proposed solutions. The benchmarking code is based on the timeit function, and is included at the end of this post.

    I consider two cases: three vectors of size n, and three vectors of sizes n/10, n and n*10 respectively (both cases give the same number of combinations). n is varied up to a maximum of 240 (I choose this value to avoid the use of virtual memory in my laptop computer).

    The results are given in the following figure. The ndgrid-based solution is seen to consistently take less time than combvec. It's also interesting to note that the time taken by combvec varies a little less regularly in the different-size case.

    enter image description here


    Benchmarking code

    Function for ndgrid-based solution:

    function combs = f1(vectors)
    n = numel(vectors); %// number of vectors
    combs = cell(1,n); %// pre-define to generate comma-separated list
    [combs{end:-1:1}] = ndgrid(vectors{end:-1:1}); %// the reverse order in these two
    %// comma-separated lists is needed to produce the rows of the result matrix in
    %// lexicographical order
    combs = cat(n+1, combs{:}); %// concat the n n-dim arrays along dimension n+1
    combs = reshape(combs,[],n);
    

    Function for combvec solution:

    function combs = f2(vectors)
    combs = combvec(vectors{:}).';
    

    Script to measure time by calling timeit on these functions:

    nn = 20:20:240;
    t1 = [];
    t2 = [];
    for n = nn;
        %//vectors = {1:n, 1:n, 1:n};
        vectors = {1:n/10, 1:n, 1:n*10};
        t = timeit(@() f1(vectors));
        t1 = [t1; t];
        t = timeit(@() f2(vectors));
        t2 = [t2; t];
    end
    
    0 讨论(0)
  • 2020-11-21 12:12

    Here's a do-it-yourself method that made me giggle with delight, using nchoosek, although it's not better than @Luis Mendo's accepted solution.

    For the example given, after 1,000 runs this solution took my machine on average 0.00065935 s, versus the accepted solution 0.00012877 s. For larger vectors, following @Luis Mendo's benchmarking post, this solution is consistently slower than the accepted answer. Nevertheless, I decided to post it in hopes that maybe you'll find something useful about it:

    Code:

    tic;
    v = {[1 2], [3 6 9], [10 20]};
    
    L = [0 cumsum(cellfun(@length,v))];
    V = cell2mat(v);
    
    J = nchoosek(1:L(end),length(v));
    J(any(J>repmat(L(2:end),[size(J,1) 1]),2) | ...
      any(J<=repmat(L(1:end-1),[size(J,1) 1]),2),:)  = [];
    
    V(J)
    toc
    

    gives

    ans =
    
     1     3    10
     1     3    20
     1     6    10
     1     6    20
     1     9    10
     1     9    20
     2     3    10
     2     3    20
     2     6    10
     2     6    20
     2     9    10
     2     9    20
    
    Elapsed time is 0.018434 seconds.
    

    Explanation:

    L gets the lengths of each vector using cellfun. Although cellfun is basically a loop, it's efficient here considering your number of vectors will have to be relatively low for this problem to even be practical.

    V concatenates all the vectors for easy access later (this assumes you entered all your vectors as rows. v' would work for column vectors.)

    nchoosek gets all the ways to pick n=length(v) elements from the total number of elements L(end). There will be more combinations here than what we need.

    J =
    
     1     2     3
     1     2     4
     1     2     5
     1     2     6
     1     2     7
     1     3     4
     1     3     5
     1     3     6
     1     3     7
     1     4     5
     1     4     6
     1     4     7
     1     5     6
     1     5     7
     1     6     7
     2     3     4
     2     3     5
     2     3     6
     2     3     7
     2     4     5
     2     4     6
     2     4     7
     2     5     6
     2     5     7
     2     6     7
     3     4     5
     3     4     6
     3     4     7
     3     5     6
     3     5     7
     3     6     7
     4     5     6
     4     5     7
     4     6     7
     5     6     7
    

    Since there are only two elements in v(1), we need to throw out any rows where J(:,1)>2. Similarly, where J(:,2)<3, J(:,2)>5, etc... Using L and repmat we can determine whether each element of J is in its appropriate range, and then use any to discard rows that have any bad element.

    Finally, these aren't the actual values from v, just indices. V(J) will return the desired matrix.

    0 讨论(0)
  • 2020-11-21 12:19

    The ndgrid function almost gives the answer, but has one caveat: n output variables must be explicitly defined to call it. Since n is arbitrary, the best way is to use a comma-separated list (generated from a cell array with ncells) to serve as output. The resulting n matrices are then concatenated into the desired n-column matrix:

    vectors = { [1 2], [3 6 9], [10 20] }; %// input data: cell array of vectors
    
    n = numel(vectors); %// number of vectors
    combs = cell(1,n); %// pre-define to generate comma-separated list
    [combs{end:-1:1}] = ndgrid(vectors{end:-1:1}); %// the reverse order in these two
    %// comma-separated lists is needed to produce the rows of the result matrix in
    %// lexicographical order 
    combs = cat(n+1, combs{:}); %// concat the n n-dim arrays along dimension n+1
    combs = reshape(combs,[],n); %// reshape to obtain desired matrix
    
    0 讨论(0)
  • 2020-11-21 12:19

    A little bit simpler ... if you have the Neural Network toolbox you can simply use combvec:

    vectors = {[1 2], [3 6 9], [10 20]};
    combs = combvec(vectors{:}).' % Use cells as arguments
    

    which returns a matrix in a slightly different order:

    combs =
    
         1     3    10
         2     3    10
         1     6    10
         2     6    10
         1     9    10
         2     9    10
         1     3    20
         2     3    20
         1     6    20
         2     6    20
         1     9    20
         2     9    20
    

    If you want the matrix that is in the question, you can use sortrows:

    combs = sortrows(combvec(vectors{:}).')
    % Or equivalently as per @LuisMendo in the comments: 
    % combs = fliplr(combvec(vectors{end:-1:1}).') 
    

    which gives

    combs =
    
         1     3    10
         1     3    20
         1     6    10
         1     6    20
         1     9    10
         1     9    20
         2     3    10
         2     3    20
         2     6    10
         2     6    20
         2     9    10
         2     9    20
    

    If you look at the internals of combvec (type edit combvec in the command window), you'll see that it uses different code than @LuisMendo's answer. I can't say which is more efficient overall.

    If you happen to have a matrix whose rows are akin to the earlier cell array you can use:

    vectors = [1 2;3 6;10 20];
    vectors = num2cell(vectors,2);
    combs = sortrows(combvec(vectors{:}).')
    
    0 讨论(0)
提交回复
热议问题