问题
I'm trying to find the most frequent word in a list of words. Here is my code so far:
uniWords = unique(lower(words));
for i = 1:length(words)
for j = 1:length(uniWords)
if (uniWords(j) == lower(words(i)))
freq(j) = freq(j) + 1;
end
end
end
When I try to run the script, I get the following error:
Undefined function 'eq' for input arguments of
type 'cell'.
Error in Biweekly3 (line 106)
if (uniWords(j) == lower(words(i)))
Any help is appreciated!
回答1:
You need to extract the contents of the cell with {}
:
strcmpi(uniWords{j},words{i})
Also, I suggest comparing strings with strcmp
or in this case strcmpi, which ignores case so you do not need to call lower
.
Be careful when using ==
on strings because they must be the same length or you will get an error:
>> s1='first';
>> s2='second';
>> s1==s2
Error using ==
Matrix dimensions must agree.
回答2:
No need for loops. unique
gives you a unique identifier for each word, and you can then sum occurrences of each identifier with sparse
. From that you easily find the maximum, and the maximizing word(s):
[~, ~, jj ] = unique(lower(words));
freq = full(sparse(ones(1,length(jj)),jj,1)); % number of occurrences of each word
m = max(freq);
result = lower(words(jj(freq==m))); % return more than one word if there's a tie
For example, with
words = {'The','hello','one','bye','the','one'}
the result is
>> result
result =
'one' 'the'
回答3:
I guess you need to do:
if (uniWords{j} == lower(words{i}))
Also, I suggest not using i and j as variables in MATLAB.
Update
As Chappjc points out, it is better to use strcmp
(or in your case strcmpi and skip lower
), since you want to ignore cases.
来源:https://stackoverflow.com/questions/19870258/comparing-strings-in-cell-arrays