Strcmp for cell arrays of unequal length in MATLAB

余生长醉 提交于 2019-12-07 02:02:46

问题


Is there an easy way to find a smaller cell array of strings within a larger one? I've got two lists, one with unique elements, and one with repeating elements. I want to find whole occurrences of the specific pattern of the smaller array within the larger. I'm aware that strcmp will compare two cell arrays, but only if they're equal in length. My first thought was to step through subsets of the larger array using a loop, but there's got to be a better solution.

For example, in the following:

smallcellarray={'string1',...
                'string2',...
                'string3'};
largecellarray={'string1',...
                'string2',...
                'string3',...
                'string1',...
                'string2',...
                'string1',...
                'string2',...
                'string3'};

index=myfunction(largecellarray,smallcellarray)

would return

index=[1 1 1 0 0 1 1 1]

回答1:


You could actually use the function ISMEMBER to get an index vector for where the cells in largecellarray occur in the smaller array smallcellarray, then use the function STRFIND (which works for both strings and numeric arrays) to find the starting indices of the smaller array within the larger:

>> nSmall = numel(smallcellarray);
>> [~, matchIndex] = ismember(largecellarray,...  %# Find the index of the 
                                smallcellarray);    %#   smallcellarray entry
                                                    %#   that each entry of
                                                    %#   largecellarray matches
>> startIndices = strfind(matchIndex,1:nSmall)  %# Starting indices where the
                                                %#   vector [1 2 3] occurs in
startIndices =                                  %#   matchIndex

     1     6

Then it's a matter of building the vector index from these starting indices. Here's one way you could create this vector:

>> nLarge = numel(largecellarray);
>> endIndices = startIndices+nSmall;  %# Get the indices immediately after
                                      %#   where the vector [1 2 3] ends
>> index = zeros(1,nLarge);           %# Initialize index to zero
>> index(startIndices) = 1;           %# Mark the start index with a 1
>> index(endIndices) = -1;            %# Mark one index after the end with a -1
>> index = cumsum(index(1:nLarge))    %# Take the cumulative sum, removing any
                                      %#   extra entry in index that may occur
index =

     1     1     1     0     0     1     1     1

Another way to create it using the function BSXFUN is given by Amro. Yet another way to create it is:

index = cumsum([startIndices; ones(nSmall-1,numel(startIndices))]);
index = ismember(1:numel(largecellarray),index);



回答2:


Here's my version (based on the answers of both @yuk and @gnovice):

g = grp2idx([S L])';
idx = strfind(g(numel(S)+1:end),g(1:numel(S)));
idx = bsxfun(@plus,idx',0:numel(S)-1);

index = zeros(size(L));
index(idx(:)) = 1;



回答3:


In @gnovice answer the first part can be

l = grp2idx(largecellarray)';
s = grp2idx(smallcellarray)';
startIndices = strfind(l,s);



回答4:


I got the following solution working, but I'm still wondering if there's a better way to do this:

function [output]=cellstrcmpi(largecell,smallcell)
output=zeros(size(largecell));
idx=1;
while idx<=length(largecell)-length(smallcell)+1
    if sum(strcmpi(largecell(idx:idx+length(smallcell)-1),smallcell))==length(smallcell)
       output(idx:idx+length(smallcell)-1)=1;
       idx=idx+length(smallcell);       
    else
        idx=idx+1;
    end
end

(I know, I know, no error checking - I'm a horrible person.)



来源:https://stackoverflow.com/questions/3152652/strcmp-for-cell-arrays-of-unequal-length-in-matlab

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!