问题
Earlier I got some help as to how to make a script that will extract hashtags from a list of tweets and put them into an array of cells. I used this as my code, inside a for loop
hashtagCell{i} = regexp(textRead{i}, '#[A-z]*', 'match');
This works for what it is supposed to do, but now I'm trying to find the average character length of the hashtags, so I need to be able to add the character length of each hashtag pulled out by the above function and add them together. However, when I try to use the size() function, it just gives me the size of the cell instead of the size of the strings, which is what I want. I can't figure out how to do this.
回答1:
This should help (and it gets rid of any loops, other than, perhaps, the one used to create CellOfText
):
%# Example cell array of tweets
CellOfText = {'Bah #humbug says #Mr scrooge'; 'No #presents for you'};
%# Get all hash tags
HTC = regexp(CellOfText, '#[A-z]*', 'match');
%# Get the average hash tag length, being careful to unnest HTC
AvgLength1 = mean(cellfun('length', [HTC{:}]));
DISCLAIMER: The inspiration for this method came from this excellent answer to a similar question. Thanks to @Andrey for that.
回答2:
For a single string it would be like this:
%# example string with hashtags.
MyText = 'this is a #text with #hashtag and also #another hashtag';
%# create the hashtagCell.
hashtagCell = regexp(MyText, '#[A-z]*', 'match');
%# compute the mean.
AverageLength = mean(cellfun(@(x) size(x,2), hashtagCell));
来源:https://stackoverflow.com/questions/13871716/character-count-of-regular-expression-in-cells-in-matlab