I have a set of integers, say S = {1,...,10}, and two matrices N and M, whose rows are some (but not necessarily all possible) permutations of elements from S of orders, say
How about this?
n = size(N,1);
m = size(M,1);
p = size(N,2);
pattern = (1:p).'; %'// will be used for checking if it's a subpermutation or not
result = false(m,n); %// preallocate result, and initiallize to 0
for k = 1:size(N,1) %// loop over columns of (transposed) N
[~, loc] = ismember(M, N(k,:)); %// entries of M equal to a value of N(:,k)
ind = find(sum(loc>0,2)==p); %// we need p matches per row of M
t = reshape(nonzeros(loc(ind,:).'),p,[]); %'// for those rows, take matches
ind2 = all(bsxfun(@eq,t,pattern)); %// ... see if they are in the right order
result(ind(ind2),k) = true; %// ... and if so, take note in result matrix
end
The result
matrix contains 1
at position r,s if the s-th row N
is a sub-permutation of the r-th row of M. From this, your desired results are
result1 = any(result,2);
result2 = sum(result,1);
Example:
M =
8 9 4 1 10
6 5 2 7 8
4 1 9 2 10
3 4 5 1 2
N =
4 1 2
4 9 10
3 5 9
give
result =
0 0 0
0 0 0
1 1 0
1 0 0
result1 =
0
0
1
1
result2 =
2 1 0
Approach #1
[val,ind] = max(bsxfun(@eq,permute(M,[4 2 1 3]),permute(N,[2 3 4 1])),[],2)
matches = squeeze(all(diff(ind,1)>0,1).*all(val,1))
out1 = any(matches,2) %// Solution - 1
out2 = sum(matches,1) %// Solution - 2
Approach #2
Another approach that avoids permuting N
and might be better for longish N
-
[val,ind] = max(bsxfun(@eq,N,permute(M,[3 4 1 2])),[],4)
matches = squeeze(all(diff(ind,[],2)>0,2).*all(val,2))
out1 = any(matches,1) %// Solution - 1
out2 = sum(matches,2) %// Solution - 2
Approach #3
Memory-scroogey approach for large datasizes -
out1 = false(size(M,1),1); %// Storage for Solution - 1
out2 = zeros(size(N,1),1); %// Storage for Solution - 2
for k=1:size(N,1)
[val3,ind3] = max(bsxfun(@eq,N(k,:),permute(M,[1 3 2])),[],3);
matches = all(diff(ind3,[],2)>0,2).*all(val3,2);
out1 = or(out1,matches);
out2(k) = sum(matches);
end
Approach #4
Memory-scroogey approach for GPU -
gM = gpuArray(M);
gN = gpuArray(N);
gout1 = false(size(gM,1),1,'gpuArray'); %// GPU Storage for Solution - 1
gout2 = zeros(size(gN,1),1,'gpuArray'); %// GPU Storage for Solution - 2
for k=1:size(gN,1)
[val3,ind3] = max(bsxfun(@eq,gN(k,:),permute(gM,[1 3 2])),[],3);
matches = all(diff(ind3,[],2)>0,2).*all(val3,2);
gout1 = or(gout1,matches);
gout2(k) = sum(matches);
end
out1 = gather(gout1); %// Solution - 1
out2 = gather(gout2); %// Solution - 2
Now, this GPU approach has blown away all other approaches. It was run with M : 320000X5
and N : 2100X3
(same as your input sizes) filled with random integer numbers. With a GTX 750 Ti
, it took just 13.867873 seconds
!! So if you have a GPU with sufficient memory, this might be your winner approach too.
Approach #5
Extremely-memory-scroogey approach for GPU -
gM = gpuArray(M);
gN = gpuArray(N);
gout1 = false(size(gM,1),1,'gpuArray'); %// GPU Storage for Solution - 1
gout2 = zeros(size(gN,1),1,'gpuArray'); %// GPU Storage for Solution - 2
for k=1:size(gN,1)
[val2,ind2] = max(bsxfun(@eq,gM,permute(gN(k,:),[1 3 2])),[],2);
matches = all(diff(ind2,[],3)>0,3).*all(val2,3);
gout1 = or(gout1,matches);
gout2(k) = sum(matches);
end
out1 = gather(gout1); %// Solution - 1
out2 = gather(gout2); %// Solution - 2
I benchmarked all the approaches against different pairs of matrices N,M
, and wherever possible, I have also compared parfor
vs. for
and picked the faster one. Here are my results:
%//Test 1: size(N) = 2263x3, size(M) = 5000x6
%//My approach (parfor): 0.650626 sec
%//Divakar's Approach 1: 1.870144 sec
%//Divakar's Approach 2: 1.164088 sec
%//Divakar's Approach 3: 0.380915 sec (with parfor)
%//Divakar's Approach 4: 2.643659 sec (gpu)
%//Luis Mendo's Approach: 1.681007 sec
%//Test 2: size(N) = 2263x3, size(M) = 25000x6
%//My approach (parfor): 6.137823 sec
%//Divakar's Approach 1: 8.342699 sec
%//Divakar's Approach 2: 5.784426 sec
%//Divakar's Approach 3: 2.251888 sec (with parfor)
%//Divakar's Approach 4: 6.272578 sec (gpu)
%//Luis Mendo's Approach: 11.514548 sec
%//Test 3: size(N) = 2100x3, size(M) = 20000x5
%//My approach (parfor): 3.417432 sec
%//Divakar's Approach 1: 5.732680 sec
%//Divakar's Approach 2: 4.107909 sec
%//Divakar's Approach 3: 1.393052 sec (with parfor)
%//Divakar's Approach 4: 3.145183 sec (gpu)
%//Luis Mendo's Approach: 5.668326 sec
%//Test 4: size(N) = 2100x3, size(M) = 324632x5
%//Divakar's Approach 3: 54.231878 sec (with parfor)
%//Divakar's Approach 4: 15.111936 sec (gpu)
%//Test 5: size(N) = 2263x3, size(M) = 1000000x6
%//Divakar's Approach 3: 210.853515 sec (with parfor)
%//Divakar's Approach 4: 49.529794 sec (gpu)
%//Divakar's Approach 5: 49.874444 sec (gpu)
%//Test 6: size(N) = 2263x3, size(M) = 5000000x6
%//Divakar's Approach 3: 1137.606244 sec (with parfor)
%//Divakar's Approach 4: stopped it after 15 min and heavy interrupts/DCPs activity
%//Divakar's Approach 5: 267.169307 sec
Among the non-gpu approaches, Divakar's Approach 3 was by far the fastest one. Its gpu counterpart started showing its advantages only with large number of rows.