Creating an md5 hash of a number, string, array, or hash in Ruby

僤鯓⒐⒋嵵緔 提交于 2019-12-02 21:04:03

I coding up the following pretty quickly and don't have time to really test it here at work, but it ought to do the job. Let me know if you find any issues with it and I'll take a look.

This should properly flatten out and sort the arrays and hashes, and you'd need to have to some pretty strange looking strings for there to be any collisions.

def createsig(body)
  Digest::MD5.hexdigest( sigflat body )
end

def sigflat(body)
  if body.class == Hash
    arr = []
    body.each do |key, value|
      arr << "#{sigflat key}=>#{sigflat value}"
    end
    body = arr
  end
  if body.class == Array
    str = ''
    body.map! do |value|
      sigflat value
    end.sort!.each do |value|
      str << value
    end
  end
  if body.class != String
    body = body.to_s << body.class.to_s
  end
  body
end

> sigflat({:a => {:b => 'b', :c => 'c'}, :d => 'd'}) == sigflat({:d => 'd', :a => {:c => 'c', :b => 'b'}})
=> true

If you could only get a string representation of body and not have the Ruby 1.8 hash come back with different orders from one time to the other, you could reliably hash that string representation. Let's get our hands dirty with some monkey patches:

require 'digest/md5'

class Object
  def md5key
    to_s
  end
end

class Array
  def md5key
    map(&:md5key).join
  end
end

class Hash
  def md5key
    sort.map(&:md5key).join
  end
end

Now any object (of the types mentioned in the question) respond to md5key by returning a reliable key to use for creating a checksum, so:

def createsig(o)
  Digest::MD5.hexdigest(o.md5key)
end

Example:

body = [
  {
    'bar' => [
      345,
      "baz",
    ],
    'qux' => 7,
  },
  "foo",
  123,
]
p body.md5key        # => "bar345bazqux7foo123"
p createsig(body)    # => "3a92036374de88118faf19483fe2572e"

Note: This hash representation does not encode the structure, only the concatenation of the values. Therefore ["a", "b", "c"] will hash the same as ["abc"].

Here's my solution. I walk the data structure and build up a list of pieces that get joined into a single string. In order to ensure that the class types seen affect the hash, I inject a single unicode character that encodes basic type information along the way. (For example, we want ["1", "2", "3"].objsum != [1,2,3].objsum)

I did this as a refinement on Object, it's easily ported to a monkey patch. To use it just require the file and run "using ObjSum".

module ObjSum
  refine Object do
    def objsum
      parts = []
      queue = [self]

      while queue.size > 0
        item = queue.shift

        if item.kind_of?(Hash)
          parts << "\\000"
          item.keys.sort.each do |k| 
            queue << k
            queue << item[k]
          end
        elsif item.kind_of?(Set)
          parts << "\\001"
          item.to_a.sort.each { |i| queue << i }
        elsif item.kind_of?(Enumerable)
          parts << "\\002"
          item.each { |i| queue << i }
        elsif item.kind_of?(Fixnum)
          parts << "\\003"
          parts << item.to_s
        elsif item.kind_of?(Float)
          parts << "\\004"
          parts << item.to_s
        else
          parts << item.to_s
        end
      end

      Digest::MD5.hexdigest(parts.join)
    end
  end
end

Just my 2 cents:

module Ext
  module Hash
    module InstanceMethods
      # Return a string suitable for generating content signature.
      # Signature image does not depend on order of keys.
      #
      #   {:a => 1, :b => 2}.signature_image == {:b => 2, :a => 1}.signature_image                  # => true
      #   {{:a => 1, :b => 2} => 3}.signature_image == {{:b => 2, :a => 1} => 3}.signature_image    # => true
      #   etc.
      #
      # NOTE: Signature images of identical content generated under different versions of Ruby are NOT GUARANTEED to be identical.
      def signature_image
        # Store normalized key-value pairs here.
        ar = []

        each do |k, v|
          ar << [
            k.is_a?(::Hash) ? k.signature_image : [k.class.to_s, k.inspect].join(":"),
            v.is_a?(::Hash) ? v.signature_image : [v.class.to_s, v.inspect].join(":"),
          ]
        end

        ar.sort.inspect
      end
    end
  end
end

class Hash    #:nodoc:
  include Ext::Hash::InstanceMethods
end

Depending on your needs, you could call ary.inspect or ary.to_yaml, even.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!