Measure the distance between two strings with Ruby?

后端 未结 6 1942
-上瘾入骨i
-上瘾入骨i 2020-11-29 08:18

Can I measure the distance between two strings with Ruby?

I.e.:

compare(\'Test\', \'est\') # Returns 1
compare(\'Test\', \'Tes\') #          


        
相关标签:
6条回答
  • 2020-11-29 08:34

    I found this for you:

    def levenshtein_distance(s, t)
      m = s.length
      n = t.length
      return m if n == 0
      return n if m == 0
      d = Array.new(m+1) {Array.new(n+1)}
    
      (0..m).each {|i| d[i][0] = i}
      (0..n).each {|j| d[0][j] = j}
      (1..n).each do |j|
        (1..m).each do |i|
          d[i][j] = if s[i-1] == t[j-1]  # adjust index into string
                      d[i-1][j-1]       # no operation required
                    else
                      [ d[i-1][j]+1,    # deletion
                        d[i][j-1]+1,    # insertion
                        d[i-1][j-1]+1,  # substitution
                      ].min
                    end
        end
      end
      d[m][n]
    end
    
    [ ['fire','water'], ['amazing','horse'], ["bamerindos", "giromba"] ].each do |s,t|
      puts "levenshtein_distance('#{s}', '#{t}') = #{levenshtein_distance(s, t)}"
    end
    

    That's awesome output: =)

    levenshtein_distance('fire', 'water') = 4
    levenshtein_distance('amazing', 'horse') = 7
    levenshtein_distance('bamerindos', 'giromba') = 9
    

    Source: http://rosettacode.org/wiki/Levenshtein_distance#Ruby

    0 讨论(0)
  • 2020-11-29 08:37

    I like DigitalRoss' solution above. However, as pointed out by dawg, its runtime grows on the order O(3^n), which is no good for longer strings. That solution can be sped up significantly using memoization, or 'dynamic programming':

    def lev(string1, string2, memo={})
      return memo[[string1, string2]] if memo[[string1, string2]]
      return string2.size if string1.empty?
      return string1.size if string2.empty?
      min = [ lev(string1.chop, string2, memo) + 1,
              lev(string1, string2.chop, memo) + 1,
              lev(string1.chop, string2.chop, memo) + (string1[-1] == string2[-1] ? 0 : 1)
           ].min
      memo[[string1, string2]] = min
      min
    end
    

    We then have much better runtime, (I think it's almost linear? I'm not really sure).

    [9] pry(main)> require 'benchmark'
    => true
    [10] pry(main)> @memo = {}
    => {}
    [11] pry(main)> Benchmark.realtime{puts lev("Hello darkness my old friend", "I've come to talk with you again")}
    26
    => 0.007071999832987785
    
    0 讨论(0)
  • 2020-11-29 08:41

    There is an utility method in Rubygems that actually should be public but it's not, anyway:

    require "rubygems/text"
    ld = Class.new.extend(Gem::Text).method(:levenshtein_distance)
    
    p ld.call("asd", "sdf") => 2
    
    0 讨论(0)
  • 2020-11-29 08:45

    Much easier and fast due to native C binding:

    gem install levenshtein-ffi
    gem install levenshtein
    
    require 'levenshtein'
    
    Levenshtein.normalized_distance string1, string2, threshold
    

    http://rubygems.org/gems/levenshtein http://rubydoc.info/gems/levenshtein/0.2.2/frames

    0 讨论(0)
  • 2020-11-29 08:56

    I made a damerau-levenshtein gem where algorithms are implemented in C

    require "damerau-levenshtein"
    dl = DamerauLevenshtein
    dl.distance("Something", "Smoething") #returns 1
    
    0 讨论(0)
  • 2020-11-29 08:58

    Much simpler, I'm a Ruby show-off at times...

    # Levenshtein distance, translated from wikipedia pseudocode by ross
    
    def lev s, t
      return t.size if s.empty?
      return s.size if t.empty?
      return [ (lev s.chop, t) + 1,
               (lev s, t.chop) + 1,
               (lev s.chop, t.chop) + (s[-1, 1] == t[-1, 1] ? 0 : 1)
           ].min
    end
    
    0 讨论(0)
提交回复
热议问题