Ruby
When "minified", this implementation becomes 165 characters long. It uses array#inject
to give a starting value (a Hash object with a default of 0) and then loop through the elements, which are then rolled into the hash; the result is then selected from the minimum frequency.
Note that I didn't count the size of the words to skip, that being an external constant. When the constant is counted too, the solution is 244 characters long.
Apostrophes and dashes aren't stripped, but included; their use modifies the word and therefore cannot be stripped simply without removal of all information beyond the symbol.
Implementation
CommonWords = %w(the a an but and is not or as of to in for by be may has can its it's)
def get_keywords(text, minFreq=0, minLen=2)
text.scan(/(?:\b)[a-z'-]{#{minLen},}(?=\b)/i).
inject(Hash.new(0)) do |result,w|
w.downcase!
result[w] += 1 unless CommonWords.include?(w)
result
end.select { |k,n| n >= minFreq }
end
Test Rig
require 'net/http'
keywords = get_keywords(Net::HTTP.get('www.sampsonresume.com','/labs/c.txt'), 3)
keywords.sort.each { |name,count| puts "#{name} x #{count} times" }
Test Results
code x 4 times
declarations x 4 times
each x 3 times
execution x 3 times
expression x 4 times
function x 5 times
keywords x 3 times
language x 3 times
languages x 3 times
new x 3 times
operators x 4 times
programming x 3 times
statement x 7 times
statements x 4 times
such x 3 times
types x 3 times
variables x 3 times
which x 4 times