Is this a correct approach to convert ByteBuffer to String in this way,
String k = \"abcd\";
ByteBuffer b = ByteBuffer.wrap(k.getBytes());
String v = new Str
Convert a String to ByteBuffer, then from ByteBuffer back to String using Java:
import java.nio.charset.Charset;
import java.nio.*;
String babel = "obufscate thdé alphebat and yolo!!";
System.out.println(babel);
//Convert string to ByteBuffer:
ByteBuffer babb = Charset.forName("UTF-8").encode(babel);
try{
//Convert ByteBuffer to String
System.out.println(new String(babb.array(), "UTF-8"));
}
catch(Exception e){
e.printStackTrace();
}
Which prints the printed bare string first, and then the ByteBuffer casted to array():
obufscate thdé alphebat and yolo!!
obufscate thdé alphebat and yolo!!
Also this was helpful for me, reducing the string to primitive bytes can help inspect what's going on:
String text = "こんにちは";
//convert utf8 text to a byte array
byte[] array = text.getBytes("UTF-8");
//convert the byte array back to a string as UTF-8
String s = new String(array, Charset.forName("UTF-8"));
System.out.println(s);
//forcing strings encoded as UTF-8 as an incorrect encoding like
//say ISO-8859-1 causes strange and undefined behavior
String sISO = new String(array, Charset.forName("ISO-8859-1"));
System.out.println(sISO);
Prints your string interpreted as UTF-8, and then again as ISO-8859-1:
こんにちは
ããã«ã¡ã¯
the root of this question is how to decode bytes to string?
this can be done with the JAVA NIO CharSet:
public final CharBuffer decode(ByteBuffer bb)
FileChannel channel = FileChannel.open(
Paths.get("files/text-latin1.txt", StandardOpenOption.READ);
ByteBuffer buffer = ByteBuffer.allocate(1024);
channel.read(buffer);
CharSet latin1 = StandardCharsets.ISO_8859_1;
CharBuffer latin1Buffer = latin1.decode(buffer);
String result = new String(latin1Buffer.array());
There is simpler approach to decode a ByteBuffer
into a String
without any problems, mentioned by Andy Thomas.
String s = StandardCharsets.UTF_8.decode(byteBuffer).toString();
Just wanted to point out, it's not safe to assume ByteBuffer.array() will always work.
byte[] bytes;
if(buffer.hasArray()) {
bytes = buffer.array();
} else {
bytes = new byte[buffer.remaining()];
buffer.get(bytes);
}
String v = new String(bytes, charset);
Usually buffer.hasArray() will always be true or false depending on your use case. In practice, unless you really want it to work under any circumstances, it's safe to optimize away the branch you don't need. But the rest of the answers may not work with a ByteBuffer that's been created through ByteBuffer.allocateDirect().
private String convertFrom(String lines, String from, String to) {
ByteBuffer bb = ByteBuffer.wrap(lines.getBytes());
CharBuffer cb = Charset.forName(to).decode(bb);
return new String(Charset.forName(from).encode(cb).array());
};
public Doit(){
String concatenatedLines = convertFrom(concatenatedLines, "CP1252", "UTF-8");
};
EDIT (2018): The edited sibling answer by @xinyongCheng is a simpler approach, and should be the accepted answer.
Your approach would be reasonable if you knew the bytes are in the platform's default charset. In your example, this is true because k.getBytes()
returns the bytes in the platform's default charset.
More frequently, you'll want to specify the encoding. However, there's a simpler way to do that than the question you linked. The String API provides methods that converts between a String and a byte[] array in a particular encoding. These methods suggest using CharsetEncoder/CharsetDecoder "when more control over the decoding [encoding] process is required."
To get the bytes from a String in a particular encoding, you can use a sibling getBytes() method:
byte[] bytes = k.getBytes( StandardCharsets.UTF_8 );
To put bytes with a particular encoding into a String, you can use a different String constructor:
String v = new String( bytes, StandardCharsets.UTF_8 );
Note that ByteBuffer.array()
is an optional operation. If you've constructed your ByteBuffer with an array, you can use that array directly. Otherwise, if you want to be safe, use ByteBuffer.get(byte[] dst, int offset, int length)
to get bytes from the buffer into a byte array.