Find out which words in a large list occur in a small string

后端 未结 3 900
暖寄归人
暖寄归人 2021-01-19 16:47

I have a static \'large\' list of words, about 300-500 words, called \'list1\'

given a relatively short string str of about 40 words, what is the fastes

3条回答
  •  后悔当初
    2021-01-19 17:06

    Here's an alternative implementation, for your edification:

    def match_freq( words, str )
      words  = words.split(/\s+/)
      counts = Hash[ words.map{ |w| [w,str.scan(w).length] } ]
      counts.delete_if{ |word,ct| ct==0 }
      occurring_words = counts.keys
      [
        counts.values.inject(0){ |sum,ct| sum+ct }, # Sum of counts
        occurring_words,
        occurring_words.length
      ]
    end
    
    list1 = "fred sam sandy jack sue bill"
    str   = "and so sammy went with jack to see fred and freddie"
    x     = match_freq(list1, str)
    p x   #=> [4, ["fred", "sam", "jack"], 3]
    

    Note that if I needed this data I would probably just return the 'counts' hash from the method and then do whatever analysis I wanted on it. If I was going to return multiple 'values' from an analysis method, I might return a Hash of named values. Although, returning an array allows you to unsplat the results:

    hits, words, word_count = match_freq(list1, str)
    p hits, words, word_count  
    #=> 4
    #=> ["fred", "sam", "jack"]
    #=> 3
    

提交回复
热议问题