I create the following for truncating a string in java to a new string with a given number of bytes.
String truncatedValue = \"\";
String curren
I've improved upon Peter Lawrey's solution to accurately handle surrogate pairs. In addition, I optimized based on the fact that the maximum number of bytes per char
in UTF-8 encoding is 3.
public static String substring(String text, int maxBytes) {
for (int i = 0, len = text.length(); (len - i) * 3 > maxBytes;) {
int j = text.offsetByCodePoints(i, 1);
if ((maxBytes -= text.substring(i, j).getBytes(StandardCharsets.UTF_8).length) < 0)
return text.substring(0, i);
i = j;
}
return text;
}
Use the UTF-8 CharsetEncoder, and encode until the output ByteBuffer contains as many bytes as you are willing to take, by looking for CoderResult.OVERFLOW.
String s = "FOOBAR";
int limit = 3;
s = new String(s.getBytes(), 0, limit);
Result value of s
:
FOO
Second Approach here works good http://www.jroller.com/holy/entry/truncating_utf_string_to_the
This is my :
private static final int FIELD_MAX = 2000;
private static final Charset CHARSET = Charset.forName("UTF-8");
public String trancStatus(String status) {
if (status != null && (status.getBytes(CHARSET).length > FIELD_MAX)) {
int maxLength = FIELD_MAX;
int left = 0, right = status.length();
int index = 0, bytes = 0, sizeNextChar = 0;
while (bytes != maxLength && (bytes > maxLength || (bytes + sizeNextChar < maxLength))) {
index = left + (right - left) / 2;
bytes = status.substring(0, index).getBytes(CHARSET).length;
sizeNextChar = String.valueOf(status.charAt(index + 1)).getBytes(CHARSET).length;
if (bytes < maxLength) {
left = index - 1;
} else {
right = index + 1;
}
}
return status.substring(0, index);
} else {
return status;
}
}
you could convert the string to bytes and convert just those bytes back to a string.
public static String substring(String text, int maxBytes) {
StringBuilder ret = new StringBuilder();
for(int i = 0;i < text.length(); i++) {
// works out how many bytes a character takes,
// and removes these from the total allowed.
if((maxBytes -= text.substring(i, i+1).getBytes().length) < 0) break;
ret.append(text.charAt(i));
}
return ret.toString();
}