Is it possible to have same hashcode for different strings using java\'s hashcode function?or if it is possible then what is the % of its possibility?
Yes, it is possible for two Strings to have the same hashcode - If you take a look at the Wikipedia article, you will see that both "FB"
and "Ea"
have the same hashcode. There is nothing in the method contract saying a hashCode()
should be used to compare for equality, you want to use equals()
for that.
Since Java 1.2, String implements hashCode()
by using a product sum algorithm over the entire text of the string.
Yes, by definition of the pigeon-hole concept, two different strings can produce the same hashcode and code should always be written to cater for such conditions (typically, by not breaking.)
The percentage of collisions for random strings should be minimal. However, if you hash strings from external sources, an attacker could easily create hundreds of thousands of strings having the same hashcode. In a java HashMap these would all map to the same bucket and effectively turn the map into a linked list. Access times to the map would then be proportional to the map size instead of constant, leading to a denial of service attack.
See this page on Effective DoS attacks against Web Application Plattforms for further information links to the presentation.
Yes(not just in Java, it applies to any language), it can produce the same hash-code for different strings. I am recalling a rule taught by my professor, it might be useful here -
Two same strings/value must have the same hashcode, but the converse is not true.
example in python
>>> hash('same-string')
-5833666992484370527
>>> hash('same-string')
-5833666992484370527
There might be another string which can match the same hash-code, so we can't derive the key using hash-code.
The reason for two different string to have the same hash-code is due to the collision.
//You can run the below code with -Xmx2100m and can get multiple results, enough to fill your console
`
import java.util.HashMap;
public class TestHashCollision {
public static void main(String[] args) {
final String TEXT = "was stored earlier had the same hash as";
HashMap<Integer,String> hs=new HashMap<>();
long t1=System.currentTimeMillis();
long t2=System.currentTimeMillis();
for(long l=0;l<Long.MAX_VALUE;l++) {
String key="d"+l;
if(hs.containsKey(key.hashCode())) {
System.out.println("'"+hs.get(key.hashCode())+"' "+TEXT+" '"+key+"'");//System.exit(0);
} else {
hs.put(key.hashCode(),key);
}
t2=System.currentTimeMillis();
if(t2-t1>10000) {
t1=System.currentTimeMillis();
System.out.println("10 seconds gone! size is:"+hs.size());
}
}
System.out.println("Done");
}
}
`
YES. A lot.
Look at following pair
can return same hash code even though the characters in it are not same.
Basically it is the sum of characters in a string multiplied by an integer.