I have come across this line of legacy code, which I am trying to figure out:
String newString = new String(oldString.getBytes(\"UTF-8\"), \"UTF-8\"));
How is this different from the following?
This line of code here:
String newString = new String(oldString.getBytes("UTF-8"), "UTF-8"));
constructs a new String object (i.e. a copy of oldString
), while this line of code:
String newString = oldString;
declares a new variable of type java.lang.String
and initializes it to refer to the same String object as the variable oldString
.
Is there any scenario in which the two lines will have different outputs?
Absolutely:
String newString = oldString;
boolean isSameInstance = newString == oldString; // isSameInstance == true
vs.
String newString = new String(oldString.getBytes("UTF-8"), "UTF-8"));
// isSameInstance == false (in most cases)
boolean isSameInstance = newString == oldString;
a_horse_with_no_name (see comment) is right of course. The equivalent of
String newString = new String(oldString.getBytes("UTF-8"), "UTF-8"));
is
String newString = new String(oldString);
minus the subtle difference wrt the encoding that Peter Lawrey explains in his answer.
This could be complicated way of doing
String newString = new String(oldString);
This shortens the String is the underlying char[] used is much longer.
However more specifically it will be checking that every character can be UTF-8 encoded.
There are some "characters" you can have in a String which cannot be encoded and these would be turned into ?
Any character between \uD800 and \uDFFF cannot be encoded and will be turned into '?'
String oldString = "\uD800";
String newString = new String(oldString.getBytes("UTF-8"), "UTF-8");
System.out.println(newString.equals(oldString));
prints
false