问题
I am working in text mining. I have 23 sentences that I have extracted from a text file along with 6 frequent words extracted from the same text file.
For frequent words, I created 1D array which shows words and in which sentences they occur. After that I took the intersection to show which word occurs with which each of other remaining words in sentence:
OccursTogether = cell(length(Out1));
for ii=1:length(Out1)
for jj=ii+1:length(Out1)
OccursTogether{ii,jj} = intersect(Out1{ii},Out1{jj});
end
end
celldisp(OccursTogether)
The output is somehow like this:
OccursTogether[1,1]= 4 3
OccursTogether[1,2]= 1 4 3
OccursTogether[1,3]= 4 3
In above [1,1] shows that word number 1 occurs with word 1 in sentence 4 and 3, [1,2] shows word 1 and word 2 occurs in sentence 1 2 and 3 and so on.
What I want to do is to implement an element absorption technique, which will remove all cells which contain supersets of other cells. As we can see above 4 and 3 in [1,1] are subset of [1,2] so OccursTogether[1,2]
entry should be deleted and output should be as follows:
occurs[1,1]= 4 3
occurs[1,3]= 4 3
Remember this should check all the possible subsets of entries in the system.
回答1:
I think this does what you want:
[ii, jj] = ndgrid(1:numel(OccursTogether));
s = cellfun(@(x,y) all(ismember(x,y)), OccursTogether(ii), OccursTogether(jj));
s = triu(s,1); %// count each pair just once, and remove self-pairs
result = OccursTogether(~any(s,1));
Example 1:
OccursTogether{1,1} = [4 3]
OccursTogether{1,2} = [1 4 3]
OccursTogether{1,3} = [1 4 3 5];
OccursTogether{1,4} = [1 4 3 5];
gives
>> celldisp(result)
result{1} =
4 3
OccursTogether{1,2}
is removed because it's a superset of OccursTogether{1,1}
. OccursTogether{1,3}
is removed because it's a superset of OccursTogether{1,2}
. OccursTogether{1,4}
is removed because it's a superset of OccursTogether{1,3}
.
Example 2:
OccursTogether{1,1} = [10 20 30]
OccursTogether{1,2} = [10 20 30]
gives
>> celldisp(result)
result{1} =
10 20 30
OccursTogether{1,2}
is removed because it's a superset of OccursTogether{1,1}
, but OccursTogether{1,1}
is not removed even if it's a superset of OccursTogether{1,2}
. The comparison is done only with previous sets (third line of code).
来源:https://stackoverflow.com/questions/29099206/how-to-remove-all-cells-which-contain-supersets-of-other-cells