Java - Fastest way to check the size of String

前端未结

关注

 3  2058

无人共我 2021-02-08 22:55

I have the following code inside a loop statement.
In the loop, strings are appended to sb(StringBuilder) and checked whether the size of sb has reached 5MB.

3条回答

小蘑菇 (楼主)

2021-02-08 23:41
You can calculate the UTF-8 length quickly using
```
public static int utf8Length(CharSequence cs) {
    return cs.codePoints()
        .map(cp -> cp<=0x7ff? cp<=0x7f? 1: 2: cp<=0xffff? 3: 4)
        .sum();
}
```
If ASCII characters dominate the contents, it might be slightly faster to use
```
public static int utf8Length(CharSequence cs) {
    return cs.length()
         + cs.codePoints().filter(cp -> cp>0x7f).map(cp -> cp<=0x7ff? 1: 2).sum();
}
```
instead.

But you may also consider the optimization potential of not recalculating the entire size, but only the size of the new fragment you’re appending to the StringBuilder, something alike
```
    StringBuilder sb = new StringBuilder();
    int length = 0;
    for(…; …; …) {
        String s = … //calculateNextString();
        sb.append(s);
        length += utf8Length(s);
        if(length >= 5242880) {
            // Do something

            // in case you're flushing the data:
            sb.setLength(0);
            length = 0;
        }
    }
```
This assumes that if you’re appending fragments containing surrogate pairs, they are always complete and not split into their halves. For ordinary applications, this should always be the case.

An additional possibility, suggested by Didier-L, is to postpone the calculation until your StringBuilder reaches a length of the threshold divided by three, as before that, it is impossible to have a UTF-8 length greater than the threshold. However, that will be only beneficial if it happens that you don’t reach threshold / 3 in some executions.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...