An existing system written in Java uses the hashcode of a string as its routing strategy for load balancing.
Now, I cannot modify the system but ne
String s = "Some String"
for (int i = 0; i < SOME_VERY_BIG_NUMBER; ++i) {
String copy = new String(s);
// Do something with copy.
}
Will this work for you? It just creates a lot of copies of the same String literal that you can then use in your testing.
You can instrument the java.lang.String class so its method hashCode() will always return the same number.
I suppose Javassist is the easiest way to do such an instrumentation.
In short:
The code will look like (roughly):
ClassPool classPool = new ClassPool(true);
CtClass stringClass = classPool.get("java.lang.String");
CtMethod hashCodeMethod = stringClass.getDeclaredMethod("hashCode", null);
hashCodeMethod.setBody("{return 0;}");
byte[] bytes = stringClass.toBytecode();
ClassDefinition[] classDefinitions = new ClassDefinition[] {new ClassDefinition(String.class, bytes);
instrumentation.redefineClasses(classDefinitions);// this instrumentation can be obtained via Java-agent
Also don't forget that agent manifest file must specify Can-Redefine-Classes: true
to be able to use redefineClasses(ClassDefinition[]) method.
I was wondering if there was a "universal" solution; e.g. some constant string XYZ
, such that
s.hashCode() == (s + XYZ).hashCode()
for any string s
. Finding such a string involves solving a fairly complicated equation ... which was beyond my rusty mathematical ability. But then it dawned on me that h == 31*h + ch
is always true
when h
and ch
are both zero!
Based on that insight, the following method should create a different String with the same hashcode as its argument:
public String collider(String s) {
return "\0" + s;
}
If NUL characters are problematic for you, prepending any string whose hashcode is zero would work too ... albeit that the colliding strings would be longer than if you used zero.
see a test method, basically, so long as you match, a1*31+b1 = a2*31 +b2, which means (a1-a2)*31=b2-b1
public void testHash()
{
System.out.println("A:" + ((int)'A'));
System.out.println("B:" + ((int)'B'));
System.out.println("a:" + ((int)'a'));
System.out.println(hash("Aa".hashCode()));
System.out.println(hash("BB".hashCode()));
System.out.println(hash("Aa".hashCode()));
System.out.println(hash("BB".hashCode()));
System.out.println(hash("AaAa".hashCode()));
System.out.println(hash("BBBB".hashCode()));
System.out.println(hash("AaBB".hashCode()));
System.out.println(hash("BBAa".hashCode()));
}
you will get
A:65
B:66
a:97
2260
2260
2260
2260
2019172
2019172
2019172
2019172
edit: someone said this is not straightforward enough. I added below part
@Test
public void testN() throws Exception {
List<String> l = HashCUtil.generateN(3);
for(int i = 0; i < l.size(); ++i){
System.out.println(l.get(i) + "---" + l.get(i).hashCode());
}
}
AaAaAa---1952508096
AaAaBB---1952508096
AaBBAa---1952508096
AaBBBB---1952508096
BBAaAa---1952508096
BBAaBB---1952508096
BBBBAa---1952508096
BBBBBB---1952508096
below is the source code, it might be not efficient, but it work:
public class HashCUtil {
private static String[] base = new String[] {"Aa", "BB"};
public static List<String> generateN(int n)
{
if(n <= 0)
{
return null;
}
List<String> list = generateOne(null);
for(int i = 1; i < n; ++i)
{
list = generateOne(list);
}
return list;
}
public static List<String> generateOne(List<String> strList)
{
if((null == strList) || (0 == strList.size()))
{
strList = new ArrayList<String>();
for(int i = 0; i < base.length; ++i)
{
strList.add(base[i]);
}
return strList;
}
List<String> result = new ArrayList<String>();
for(int i = 0; i < base.length; ++i)
{
for(String str: strList)
{
result.add(base[i] + str);
}
}
return result;
}
}
look at String.hashCode()
public int hashCode() {
int h = hash;
if (h == 0) {
int off = offset;
char val[] = value;
int len = count;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return h;
}
Given String X, then String Y = "\u0096\0\0ɪ\0ˬ" + X will share same hashcode with X.
Explanation:
String.hashcode() returns Integer, and every Integer X in java has property that X = X + 2 * (Integer.MAX_VALUE + 1). Here, Integer.MAX_VALUE = 2 ^ 31 - 1;
So we only need to find String M, which has the property that M's hashcode % (2 * (Integer.MAX_VALUE + 1)) = 0;
I find "\u0096\0\0ɪ\0ˬ" : \u0096 's ascii code is 150,\0 's ascii code is 0, ɪ's ascii code is 618, ˬ's ascii code is 748, so its hashcode is 150 * 31 ^ 5 + 618 * 31 ^ 2 + 748 = 2 ^ 32 = 0;
It is up to you which string you would like, and I pick this one.
I think find a equal-hash string from a long string is too hard, it's easy when find equal-hash string of an short string (2 or 3). Look at the equation below. (sorry I cant post image cause me new member)
Notice that, "FB" and "Ea" have the same hashcode, and any two strings like s1+"FB"+s2 and s1+"Ea"+s2 will have the same hashcode. So, the easy solution is finding any 2-char substring of existing string and replace with a 2-char substring with the same hashcode
Exmaple, we have the string "helloworld" get 2-char substring "he", hashcode("he") = 'h'*31 + 'e' = ('h'*31 + 31) + ('e' - 31) = ('h'+1)*31 + 'F' = 'i' + 'F' = hashcode("iF") so the desire string is "iFlloworld" we have increased 'h' by 1, we can increase by 2, or 3 etc (but will be wrong if it overflow the char value)
The below code run well with small level, it will wrong if the level is big, make the char value overflow, I will fix it later if you want (this code change 2 first chars, but I will edit code to 2 last chars because 2 first chars are calc with largest value)
public static String samehash(String s, int level) {
if (s.length() < 2)
return s;
String sub2 = s.substring(0, 2);
char c0 = sub2.charAt(0);
char c1 = sub2.charAt(1);
c0 = (char) (c0 + level);
c1 = (char) (c1 - 31 * level);
String newsub2 = new String(new char[] { c0, c1 });
String re = newsub2 + s.substring(2);
return re;
}