问题
i've a set of data consisting of sets i want to remove super sets for which subsets are present as follows:
a{1} = [5]
a{2} = [4 11 14]
a{3} = [1]
a{4} = [5 16]
a{5} = [5]
a{6} = [11 16]
a{7} = [11]
a{8} = [16]
a{9} = [9 14 17]
a{10} = [14]
[ii, jj] = ndgrid(1:numel(a));
s = cellfun(@(x,y) all(ismember(x,y)), a(ii), a(jj));
s = triu(s,1); %// count each pair just once, and remove self-pairs
similarity = a(~any(s,1));
celldisp(similarity)
the result is as follows:
a{1} = [5]
a{2} = [4 11 14]
a{3} = [1]
a{4} = [11 16]
a{5} = [11]
a{6} = [16]
a{7} = [9 14 17]
a{8} = [14]
as the output shows there are still supersets that should be removed i.e. a{2}
because a{5}
contains 11
which is its subset,a{4}
should be removed because a{5}
contains 11
and a{6}
contain 16
as well as a{7}
should be deleted too because a{8}
contains subset 14
.
expected output is
a{1} = [5]
a{2} = [1]
a{3} = [11]
a{4} = [16]
a{5} = [14]
can anyone help how to fix this code so that i can get accurate set of results. thanks
回答1:
I think you need to use the lower triangular part instead of the upper:
s = tril(s,-1); % instead of s = triu(s,1);
Edit
Keeping the lower triangular part only works when the supersets always occur before the subsets. Here is a general version that should always work fine.
[ii, jj] = ndgrid(1:numel(a));
s = cellfun(@(x,y) all(ismember(x,y)), a(ii), a(jj));
% Set diagonal to zero.
s = s - diag(diag(s));
% Indicator matrix for sets that are exactly equal.
same = s & s';
% For equal sets keep only the first occurence.
keep = triu(same) | ~same.*s;
% Delete supersets.
similarity = a(~any(keep,1));
celldisp(similarity)
By the way, it might be easier to just run a double loop instead of the above matrix operations.
来源:https://stackoverflow.com/questions/29679604/issue-in-deleting-supersets-in-matlab