How to get words frequency in efficient way with ruby?

前端 未结 7 1681
伪装坚强ぢ
伪装坚强ぢ 2021-02-05 17:50

Sample input:

\"I was 09809 home -- Yes! yes!  You was\"

and output:

{ \'yes\' => 2, \'was\' => 2, \'i\' => 1, \'home\         


        
相关标签:
7条回答
  • 2021-02-05 18:40

    You can look at my code that splits the text into words. The basic code would look as follows:

    sentence = "Ala ma kota za 5zł i 10$."
    splitter = SRX::Polish::WordSplitter.new(sentence)
    histogram = Hash.new(0)
    splitter.each do |word,type|
      histogram[word.downcase] += 1 if type == :word
    end
    p histogram
    

    You should be careful if you wish to work with languages other than English, since in Ruby 1.9 the downcase won't work as you expected for letters such as 'Ł'.

    0 讨论(0)
提交回复
热议问题