If getHashCode() for string or integer is not guaranteed to be unique why use it?

流过昼夜 提交于 2020-02-04 03:25:07

问题


As i wrote in the title.

If its not safe to use getHashCode() in your application, why use it? (for string and integer) I want to use it to intersect methods and except metods in Linq models or create my own IEqualityCompare class. It feels like a chance - if its not 100% secure?

Or have i missed something?

As quoted in String.GetHashCode Method in https://docs.microsoft.com/

Important

If two string objects are equal, the GetHashCode method returns identical values. However, there is not a unique hash code value for each unique string value. Different strings can return the same hash code.

The hash code itself is not guaranteed to be stable. Hash codes for identical strings can differ across .NET implementations, across .NET versions, and across .NET platforms (such as 32-bit and 64-bit) for a single version of .NET. In some cases, they can even differ by application domain. This implies that two subsequent runs of the same program may return different hash codes.

As a result, hash codes should never be used outside of the application domain in which they were created, they should never be used as key fields in a collection, and they should never be persisted.

Finally, don't use the hash code instead of a value returned by a cryptographic hashing function if you need a cryptographically strong hash. For cryptographic hashes, use a class derived from the System.Security.Cryptography.HashAlgorithm or System.Security.Cryptography.KeyedHashAlgorithm class.

For more information about hash codes, see Object.GetHashCode.


回答1:


I think what makes you confused is that you think that, that hash code maps to an address of a value, but it's not exactly like that.

Imagine it like bookshelves, and Hash Code maps to address of a shelf. If two of them have the same HashCode will be placed in the same Shelf, and having the address of a shelf with 3 books in it, dictionary only checks the three books on the shelf and not all the books. So the more unique hash codes are, the faster the dictionary lookup is.

When you create IEqualityComparer if you can make the GetHashCode() to return unique values, the Dictionary or HashSet using it will perform faster than when there are many duplicates.

Check This example:

public int GetShashCode(string ojb)
{
     return obj.Length;
}

although it makes it much faster than looping through the whole strings, but it is not very unique (although it is valid)

This example is also valid but even a worse choice:

public int GetShashCode(string ojb)
{
     return (int)obj[0];
}

Based on the content of the string that you can guess, you can make much better hashcodes (for example you know that that it is a social security number in this format: "XXX-XX-XXXX" which each X represent a digit) will be a great choice:

public int GetShashCode(string ojb)
{
     return int.Parse(obj.Replace("-",""));
}



回答2:


If its not safe to use getHashCode() in your application, why use it?

GetHashCode has a different purpose. If you need an equality test for strings you should probably use String.Equals or == operator, these are guaranteed to work correctly.

Hash code isn't meant to be a way to generate a unique number for each possible string, this is impossible. Here's the definition of hash function:

A hash function is any function that can be used to map data of arbitrary size to fixed-size values.

It just maps a nearly infinite set of strings to a (comparatively) very limited set of integers. You might want to use a hash code if you need to uniformly spread a large number of strings to smaller "buckets". Hash codes are used extensively in hash-based collections, e.g. HashSet.

The documentation for GetHashCode mentions different issues with this method:

  • The method can generate a different result for the same string on different domains/machines/versions of .Net. This means that it's not a good idea to store the hash externally as some sort of unique identifier for later use;
  • The result is not cryptographically strong, so you shouldn't use it if you need an unbreakable password salt.

Surely, it looks scary, but still, GetHashCode is good enough for in-memory collections, such as HashSet or Dictionary.

Also, see this question: Why is it important to override GetHashCode when Equals method is overridden?



来源:https://stackoverflow.com/questions/59226000/if-gethashcode-for-string-or-integer-is-not-guaranteed-to-be-unique-why-use-it

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!