问题
In the following two strings, the words 'rabbit' and 'tree' are matching:
str1 = ('rabbit is eating grass near a tree');
str2 = ('rabbit is sleeping under tree');
Suppose cmp
is a variable declared to compare both. I want the result as:
cmp = 2
or something that shows that two words are matching. How do I do this?
回答1:
"Crazy" bsxfun approach, which might be similar to intersect, but not tested -
Function -
function out = cell2_matchind(split1,split2)
c1 = char(split1)-'0';
c2 = char(split2)-'0';
if size(c1,2)<size(c2,2)
c1 = [c1 -16.*ones(size(c1,1),size(c2,2)-size(c1,2))];
else
c2 = [c2 -16.*ones(size(c2,1),size(c1,2)-size(c2,2))];
end
out = any(squeeze(sum(bsxfun(@eq,permute(c1,[3 2 1]),c2),2))==size(c2,2),2);
Main MATLAB script -
% Source of stopwords- http://norm.al/2009/04/14/list-of-english-stop-words/
stopwords_cellstring={'a', 'about', 'above', 'above', 'across', 'after', ...
'afterwards', 'again', 'against', 'all', 'almost', 'alone', 'along', ...
'already', 'also','although','always','am','among', 'amongst', 'amoungst', ...
'amount', 'an', 'and', 'another', 'any','anyhow','anyone','anything','anyway', ...
'anywhere', 'are', 'around', 'as', 'at', 'back','be','became', 'because','become',...
'becomes', 'becoming', 'been', 'before', 'beforehand', 'behind', 'being', 'below',...
'beside', 'besides', 'between', 'beyond', 'bill', 'both', 'bottom','but', 'by',...
'call', 'can', 'cannot', 'cant', 'co', 'con', 'could', 'couldnt', 'cry', 'de',...
'describe', 'detail', 'do', 'done', 'down', 'due', 'during', 'each', 'eg', 'eight',...
'either', 'eleven','else', 'elsewhere', 'empty', 'enough', 'etc', 'even', 'ever', ...
'every', 'everyone', 'everything', 'everywhere', 'except', 'few', 'fifteen', 'fify',...
'fill', 'find', 'fire', 'first', 'five', 'for', 'former', 'formerly', 'forty', 'found',...
'four', 'from', 'front', 'full', 'further', 'get', 'give', 'go', 'had', 'has', 'hasnt',...
'have', 'he', 'hence', 'her', 'here', 'hereafter', 'hereby', 'herein', 'hereupon', ...
'hers', 'herself', 'him', 'himself', 'his', 'how', 'however', 'hundred', 'ie', 'if',...
'in', 'inc', 'indeed', 'interest', 'into', 'is', 'it', 'its', 'itself', 'keep', 'last',...
'latter', 'latterly', 'least', 'less', 'ltd', 'made', 'many', 'may', 'me', 'meanwhile',...
'might', 'mill', 'mine', 'more', 'moreover', 'most', 'mostly', 'move', 'much', 'must',...
'my', 'myself', 'name', 'namely', 'neither', 'never', 'nevertheless', 'next', 'nine',...
'no', 'nobody', 'none', 'noone', 'nor', 'not', 'nothing', 'now', 'nowhere', 'of', 'off',...
'often', 'on', 'once', 'one', 'only', 'onto', 'or', 'other', 'others', 'otherwise',...
'our', 'ours', 'ourselves', 'out', 'over', 'own','part', 'per', 'perhaps', 'please',...
'put', 'rather', 're', 'same', 'see', 'seem', 'seemed', 'seeming', 'seems', 'serious',...
'several', 'she', 'should', 'show', 'side', 'since', 'sincere', 'six', 'sixty', 'so',...
'some', 'somehow', 'someone', 'something', 'sometime', 'sometimes', 'somewhere', ...
'still', 'such', 'system', 'take', 'ten', 'than', 'that', 'the', 'their', 'them',...
'themselves', 'then', 'thence', 'there', 'thereafter', 'thereby', 'therefore', ...
'therein', 'thereupon', 'these', 'they', 'thickv', 'thin', 'third', 'this', 'those',...
'though', 'three', 'through', 'throughout', 'thru', 'thus', 'to', 'together', 'too',...
'top', 'toward', 'towards', 'twelve', 'twenty', 'two', 'un', 'under', 'until', 'up',...
'upon', 'us', 'very', 'via', 'was', 'we', 'well', 'were', 'what', 'whatever', 'when',...
'whence', 'whenever', 'where', 'whereafter', 'whereas', 'whereby', 'wherein',...
'whereupon', 'wherever', 'whether', 'which', 'while', 'whither', 'who', 'whoever',...
'whole', 'whom', 'whose', 'why', 'will', 'with', 'within', 'without', 'would', 'yet',...
'you', 'your', 'yours', 'yourself', 'yourselves', 'the'};
str1 = ('rabbit is eating grass near a tree and will be sleeping inside the tree-hole');
str2 = ('rabbit is sleeping under tree and after waking up will be eating the nuts nearby');
split1 = unique(regexp(str1,'\s','Split'),'stable');
split2 = unique(regexp(str2,'\s','Split'),'stable');
cw_split2 = split2(cell2_matchind(split1,split2))
cw_split2_nostopwd = cw_split2(~cell2_matchind(stopwords_cellstring,cw_split2))
cmp = numel(cw_split2_nostopwd)
Output -
cw_split2 =
'rabbit' 'is' 'sleeping' 'tree' 'and' 'will' 'be' 'eating' 'the'
cw_split2_nostopwd =
'rabbit' 'sleeping' 'tree' 'eating'
cmp =
4
回答2:
As per the other answer split the string into a cell array of unique words.
str1= ('rabbit is eating grass near a tree');
str2= ('rabbit is sleeping under tree');
% split string into cell array of unique strings
split1 = regexp(str1,'\s','Split');
split2 = regexp(str2,'\s','Split');
Alternatively later versions of MATLAB (IIRC R2013a) includes a strsplit() function so the split could be reduced to
split1 = strsplit(str1);
split2 = strsplit(str2);
Then use the intersect() function to get the number of common elements between the two cell arrays. Add a length to return the integer count.
cmp = length(intersect(split1,split2));
回答3:
I am assuming there is no restriction on the location or order in which they are matching. First you need to split the sentence into individual words, remove any duplicates, and then see if any words in sentence two matches ones in the first sentence.
Now if ordering does matter, it is not quite as straightforward, but your question made no indication of such constraints
str1= ('rabbit is eating grass near a tree');
str2= ('rabbit is sleeping under tree');
split1 = unique(regexp(str1,'\s','Split'));
split2 = unique(regexp(str2,'\s','Split'));
% Storing all words in the first sentence into a map for quick search/access
dict = containers.Map();
for ii = 1:numel(split1)
dict(split1{ii}) = true;
end
% create temp holding cell array, then loop through, looking to see if
% any word in the second sentence is stored in the dictionary made from
% the first sentence.
matches = {};
for jj = 1:numel(split2)
if dict.isKey(split2{jj})
matches = [matches,split2{jj}]; % not best but length initially unknown
end
end
numMatches = numel(matches) % return the number of matches
The variable matches
will contain all of the words that match between the two sentences.
回答4:
With ismember
you just need one line.
str1 = ('rabbit is eating grass near a tree');
str2 = ('rabbit is sleeping under a tree');
result = sum( ismember( strsplit(str1), strsplit(str2) ) )
result =
4 %// I included also the article "a"
Be aware that for the following sentences the result is the same:
str1 = ('rabbit is eating grass near a tree, an oak tree');
str2 = ('rabbit is sleeping under a tree and is dreaming about a tree');
result = sum( ismember( strsplit(str1), strsplit(str2) ) )
The removing of duplicates in advance, suggested by MZimmerman6 is not necessary.
If you want to filter the result for unwanted strings, you can introduce another cell array of strings with all exceptions:
str3 = {'is','a'}
unwanted = sum( ismember( intersect( strsplit(str1), strsplit(str2) ), str3 ) )
unwanted =
2
Alltogether it could look like:
str1 = ('rabbit is eating grass near a tree, an oak tree');
str2 = ('rabbit is sleeping under a tree and is dreaming about a tree');
str3 = {'is','a'}
[x,y,z] = deal( strsplit(str1), strsplit(str2), str3 )
result = sum(ismember(x,y)) - sum(ismember(intersect(x,y),z))
= 4 - 2 = 2
回答5:
Use this for case insensitivity;
CMP = strcmpi(string,string)
Use this for case sensitivity;
CMP = strcmpi(string,string)
if CMP is 1 they are same if 0 they are not.
If you dont want to whitespaces, which makes better comparison please first trim them and compare.
For trimming;
newString = strtrim(str)
来源:https://stackoverflow.com/questions/22383071/how-to-match-certain-words-between-two-strings-in-matlab