I have a static \'large\' list of words, about 300-500 words, called \'list1\'
given a relatively short string str
of about 40 words, what is the fastes
Here's an alternative implementation, for your edification:
def match_freq( words, str )
words = words.split(/\s+/)
counts = Hash[ words.map{ |w| [w,str.scan(w).length] } ]
counts.delete_if{ |word,ct| ct==0 }
occurring_words = counts.keys
[
counts.values.inject(0){ |sum,ct| sum+ct }, # Sum of counts
occurring_words,
occurring_words.length
]
end
list1 = "fred sam sandy jack sue bill"
str = "and so sammy went with jack to see fred and freddie"
x = match_freq(list1, str)
p x #=> [4, ["fred", "sam", "jack"], 3]
Note that if I needed this data I would probably just return the 'counts' hash from the method and then do whatever analysis I wanted on it. If I was going to return multiple 'values' from an analysis method, I might return a Hash of named values. Although, returning an array allows you to unsplat the results:
hits, words, word_count = match_freq(list1, str)
p hits, words, word_count
#=> 4
#=> ["fred", "sam", "jack"]
#=> 3