How to match certain words between two strings (in MATLAB)?

社会主义新天地 提交于 2019-12-11 02:08:03

问题


In the following two strings, the words 'rabbit' and 'tree' are matching:

str1 = ('rabbit is eating grass near a tree');
str2 = ('rabbit is sleeping under tree');

Suppose cmp is a variable declared to compare both. I want the result as:

cmp = 2

or something that shows that two words are matching. How do I do this?


回答1:


"Crazy" bsxfun approach, which might be similar to intersect, but not tested -

Function -

function out = cell2_matchind(split1,split2)

c1 = char(split1)-'0';
c2 = char(split2)-'0';
if size(c1,2)<size(c2,2)
    c1 = [c1 -16.*ones(size(c1,1),size(c2,2)-size(c1,2))];
else
    c2 = [c2 -16.*ones(size(c2,1),size(c1,2)-size(c2,2))];
end
out = any(squeeze(sum(bsxfun(@eq,permute(c1,[3 2 1]),c2),2))==size(c2,2),2);

Main MATLAB script -

% Source of stopwords- http://norm.al/2009/04/14/list-of-english-stop-words/
stopwords_cellstring={'a', 'about', 'above', 'above', 'across', 'after', ...
    'afterwards', 'again', 'against', 'all', 'almost', 'alone', 'along', ...
    'already', 'also','although','always','am','among', 'amongst', 'amoungst', ...
    'amount',  'an', 'and', 'another', 'any','anyhow','anyone','anything','anyway', ...
    'anywhere', 'are', 'around', 'as',  'at', 'back','be','became', 'because','become',...
    'becomes', 'becoming', 'been', 'before', 'beforehand', 'behind', 'being', 'below',...
    'beside', 'besides', 'between', 'beyond', 'bill', 'both', 'bottom','but', 'by',...
    'call', 'can', 'cannot', 'cant', 'co', 'con', 'could', 'couldnt', 'cry', 'de',...
    'describe', 'detail', 'do', 'done', 'down', 'due', 'during', 'each', 'eg', 'eight',...
    'either', 'eleven','else', 'elsewhere', 'empty', 'enough', 'etc', 'even', 'ever', ...
    'every', 'everyone', 'everything', 'everywhere', 'except', 'few', 'fifteen', 'fify',...
    'fill', 'find', 'fire', 'first', 'five', 'for', 'former', 'formerly', 'forty', 'found',...
    'four', 'from', 'front', 'full', 'further', 'get', 'give', 'go', 'had', 'has', 'hasnt',...
    'have', 'he', 'hence', 'her', 'here', 'hereafter', 'hereby', 'herein', 'hereupon', ...
    'hers', 'herself', 'him', 'himself', 'his', 'how', 'however', 'hundred', 'ie', 'if',...
    'in', 'inc', 'indeed', 'interest', 'into', 'is', 'it', 'its', 'itself', 'keep', 'last',...
    'latter', 'latterly', 'least', 'less', 'ltd', 'made', 'many', 'may', 'me', 'meanwhile',...
    'might', 'mill', 'mine', 'more', 'moreover', 'most', 'mostly', 'move', 'much', 'must',...
    'my', 'myself', 'name', 'namely', 'neither', 'never', 'nevertheless', 'next', 'nine',...
    'no', 'nobody', 'none', 'noone', 'nor', 'not', 'nothing', 'now', 'nowhere', 'of', 'off',...
    'often', 'on', 'once', 'one', 'only', 'onto', 'or', 'other', 'others', 'otherwise',...
    'our', 'ours', 'ourselves', 'out', 'over', 'own','part', 'per', 'perhaps', 'please',...
    'put', 'rather', 're', 'same', 'see', 'seem', 'seemed', 'seeming', 'seems', 'serious',...
    'several', 'she', 'should', 'show', 'side', 'since', 'sincere', 'six', 'sixty', 'so',...
    'some', 'somehow', 'someone', 'something', 'sometime', 'sometimes', 'somewhere', ...
    'still', 'such', 'system', 'take', 'ten', 'than', 'that', 'the', 'their', 'them',...
    'themselves', 'then', 'thence', 'there', 'thereafter', 'thereby', 'therefore', ...
    'therein', 'thereupon', 'these', 'they', 'thickv', 'thin', 'third', 'this', 'those',...
    'though', 'three', 'through', 'throughout', 'thru', 'thus', 'to', 'together', 'too',...
    'top', 'toward', 'towards', 'twelve', 'twenty', 'two', 'un', 'under', 'until', 'up',...
    'upon', 'us', 'very', 'via', 'was', 'we', 'well', 'were', 'what', 'whatever', 'when',...
    'whence', 'whenever', 'where', 'whereafter', 'whereas', 'whereby', 'wherein',...
    'whereupon', 'wherever', 'whether', 'which', 'while', 'whither', 'who', 'whoever',...
    'whole', 'whom', 'whose', 'why', 'will', 'with', 'within', 'without', 'would', 'yet',...
    'you', 'your', 'yours', 'yourself', 'yourselves', 'the'};

str1 = ('rabbit is eating grass near a tree and will be sleeping inside the tree-hole');
str2 = ('rabbit is sleeping under tree and after waking up will be eating the nuts nearby');

split1 = unique(regexp(str1,'\s','Split'),'stable');
split2 = unique(regexp(str2,'\s','Split'),'stable');

cw_split2 = split2(cell2_matchind(split1,split2))
cw_split2_nostopwd = cw_split2(~cell2_matchind(stopwords_cellstring,cw_split2))
cmp = numel(cw_split2_nostopwd)

Output -

cw_split2 = 
    'rabbit'    'is'    'sleeping'    'tree'    'and'    'will'    'be'    'eating'    'the'

cw_split2_nostopwd = 
    'rabbit'    'sleeping'    'tree'    'eating'

cmp =
     4



回答2:


As per the other answer split the string into a cell array of unique words.

str1= ('rabbit is eating grass near a tree');
str2= ('rabbit is sleeping under tree');

% split string into cell array of unique strings
split1 = regexp(str1,'\s','Split');
split2 = regexp(str2,'\s','Split');

Alternatively later versions of MATLAB (IIRC R2013a) includes a strsplit() function so the split could be reduced to

split1 = strsplit(str1);
split2 = strsplit(str2);

Then use the intersect() function to get the number of common elements between the two cell arrays. Add a length to return the integer count.

cmp = length(intersect(split1,split2));



回答3:


I am assuming there is no restriction on the location or order in which they are matching. First you need to split the sentence into individual words, remove any duplicates, and then see if any words in sentence two matches ones in the first sentence.

Now if ordering does matter, it is not quite as straightforward, but your question made no indication of such constraints

str1= ('rabbit is eating grass near a tree');
str2= ('rabbit is sleeping under tree');
split1 = unique(regexp(str1,'\s','Split'));
split2 = unique(regexp(str2,'\s','Split'));

% Storing all words in the first sentence into a map for quick search/access
dict = containers.Map();
for ii = 1:numel(split1)
   dict(split1{ii}) = true; 
end

% create temp holding cell array, then loop through, looking to see if 
% any word in the second sentence is stored in the dictionary made from
% the first sentence. 
matches = {};
for jj = 1:numel(split2)
    if dict.isKey(split2{jj})
        matches = [matches,split2{jj}]; % not best but length initially unknown
    end
end

numMatches = numel(matches) % return the number of matches

The variable matches will contain all of the words that match between the two sentences.




回答4:


With ismember you just need one line.

str1 = ('rabbit is eating grass near a tree');
str2 = ('rabbit is sleeping under a tree');

result = sum( ismember( strsplit(str1), strsplit(str2) ) )

result =

    4               %// I included also the article "a"

Be aware that for the following sentences the result is the same:

str1 = ('rabbit is eating grass near a tree, an oak tree');
str2 = ('rabbit is sleeping under a tree and is dreaming about a tree');

result = sum( ismember( strsplit(str1), strsplit(str2) ) )

The removing of duplicates in advance, suggested by MZimmerman6 is not necessary.


If you want to filter the result for unwanted strings, you can introduce another cell array of strings with all exceptions:

str3 = {'is','a'}
unwanted = sum( ismember( intersect( strsplit(str1), strsplit(str2) ), str3 ) )

unwanted =

     2

Alltogether it could look like:

str1 = ('rabbit is eating grass near a tree, an oak tree');
str2 = ('rabbit is sleeping under a tree and is dreaming about a tree');
str3 = {'is','a'}

[x,y,z] = deal( strsplit(str1), strsplit(str2), str3 )
result = sum(ismember(x,y)) - sum(ismember(intersect(x,y),z))
       =       4            -            2           =        2



回答5:


Use this for case insensitivity;

CMP = strcmpi(string,string)

Use this for case sensitivity;

CMP = strcmpi(string,string)

if CMP is 1 they are same if 0 they are not.

If you dont want to whitespaces, which makes better comparison please first trim them and compare.

For trimming;

newString = strtrim(str)


来源:https://stackoverflow.com/questions/22383071/how-to-match-certain-words-between-two-strings-in-matlab

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!