Efficiently build a graph of words with given Hamming distance

后端 未结 4 667
情话喂你
情话喂你 2021-02-07 04:08

I want to build a graph from a list of words with Hamming distance of (say) 1, or to put it differently, two words are connected if they only differ from one letter (lo

4条回答
  •  臣服心动
    2021-02-07 04:58

    Here is linear O(N) algorithm, but with big constant factor (R * L * 2). R is radix (for latin alphabet it is 26). L is a medium length of word. 2 is a factor of adding/replacing wildcard character. So abc and aac and abca are two ops wich leads to hamming distance of 1.

    It is written in Ruby. And for 240k words it takes ~250Mb RAM and 136 seconds on average hardware

    Blueprint of graph implementation

    class Node
      attr_reader :val, :edges
    
      def initialize(val)
        @val = val
        @edges = {}
      end
    
      def <<(node)
        @edges[node.val] ||= true
      end
    
      def connected?(node)
        @edges[node.val]
      end
    
      def inspect
        "Val: #{@val}, edges: #{@edges.keys * ', '}"
      end
    end
    
    class Graph
      attr_reader :vertices
      def initialize
        @vertices = {}
      end
    
      def <<(val)
        @vertices[val] = Node.new(val)
      end
    
      def connect(node1, node2)
        # print "connecting #{size} #{node1.val}, #{node2.val}\r"
        node1 << node2
        node2 << node1
      end
    
      def each
        @vertices.each do |val, node|
          yield [val, node]
        end
      end
    
      def get(val)
        @vertices[val]
      end
    end
    

    The algorithm itself

    CHARACTERS = ('a'..'z').to_a
    graph = Graph.new
    
    # ~ 240 000 words
    File.read("/usr/share/dict/words").each_line.each do |word|
      word = word.chomp
      graph << word.downcase
    end
    
    graph.each do |val, node|
      CHARACTERS.each do |char|
        i = 0
        while i <= val.size
          node2 = graph.get(val[0, i] + char + val[i..-1])
          graph.connect(node, node2) if node2
          if i < val.size
            node2 = graph.get(val[0, i] + char + val[i+1..-1])
            graph.connect(node, node2) if node2
          end
          i += 1
        end
      end
    end
    

提交回复
热议问题