How to get words frequency in efficient way with ruby?

前端 未结 7 1673
伪装坚强ぢ
伪装坚强ぢ 2021-02-05 17:50

Sample input:

\"I was 09809 home -- Yes! yes!  You was\"

and output:

{ \'yes\' => 2, \'was\' => 2, \'i\' => 1, \'home\         


        
7条回答
  •  南方客
    南方客 (楼主)
    2021-02-05 18:40

    You can look at my code that splits the text into words. The basic code would look as follows:

    sentence = "Ala ma kota za 5zł i 10$."
    splitter = SRX::Polish::WordSplitter.new(sentence)
    histogram = Hash.new(0)
    splitter.each do |word,type|
      histogram[word.downcase] += 1 if type == :word
    end
    p histogram
    

    You should be careful if you wish to work with languages other than English, since in Ruby 1.9 the downcase won't work as you expected for letters such as 'Ł'.

提交回复
热议问题