How to remove all cells which contain supersets of other cells?

荒凉一梦 提交于 2019-12-24 22:25:22

问题


I am working in text mining. I have 23 sentences that I have extracted from a text file along with 6 frequent words extracted from the same text file.

For frequent words, I created 1D array which shows words and in which sentences they occur. After that I took the intersection to show which word occurs with which each of other remaining words in sentence:

OccursTogether = cell(length(Out1));
for ii=1:length(Out1)
    for jj=ii+1:length(Out1)
        OccursTogether{ii,jj} = intersect(Out1{ii},Out1{jj});
    end
end
celldisp(OccursTogether)

The output is somehow like this:

OccursTogether[1,1]= 4 3
OccursTogether[1,2]= 1 4 3
OccursTogether[1,3]= 4 3

In above [1,1] shows that word number 1 occurs with word 1 in sentence 4 and 3, [1,2] shows word 1 and word 2 occurs in sentence 1 2 and 3 and so on.

What I want to do is to implement an element absorption technique, which will remove all cells which contain supersets of other cells. As we can see above 4 and 3 in [1,1] are subset of [1,2] so OccursTogether[1,2] entry should be deleted and output should be as follows:

occurs[1,1]= 4 3
occurs[1,3]= 4 3

Remember this should check all the possible subsets of entries in the system.


回答1:


I think this does what you want:

[ii, jj] = ndgrid(1:numel(OccursTogether));
s = cellfun(@(x,y) all(ismember(x,y)), OccursTogether(ii), OccursTogether(jj));
s = triu(s,1); %// count each pair just once, and remove self-pairs
result = OccursTogether(~any(s,1));

Example 1:

OccursTogether{1,1} = [4 3]
OccursTogether{1,2} = [1 4 3]
OccursTogether{1,3} = [1 4 3 5];
OccursTogether{1,4} = [1 4 3 5];

gives

>> celldisp(result)
result{1} =
     4     3

OccursTogether{1,2} is removed because it's a superset of OccursTogether{1,1}. OccursTogether{1,3} is removed because it's a superset of OccursTogether{1,2}. OccursTogether{1,4} is removed because it's a superset of OccursTogether{1,3}.

Example 2:

OccursTogether{1,1} = [10 20 30]
OccursTogether{1,2} = [10 20 30]

gives

>> celldisp(result)
result{1} =
    10    20    30

OccursTogether{1,2} is removed because it's a superset of OccursTogether{1,1}, but OccursTogether{1,1} is not removed even if it's a superset of OccursTogether{1,2}. The comparison is done only with previous sets (third line of code).



来源:https://stackoverflow.com/questions/29099206/how-to-remove-all-cells-which-contain-supersets-of-other-cells

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!