Truncating Strings by Bytes

后端 未结 13 1741
醉酒成梦
醉酒成梦 2021-02-06 04:21

I create the following for truncating a string in java to a new string with a given number of bytes.

        String truncatedValue = \"\";
        String curren         


        
13条回答
  •  春和景丽
    2021-02-06 04:52

    Why not convert to bytes and walk forward--obeying UTF8 character boundaries as you do it--until you've got the max number, then convert those bytes back into a string?

    Or you could just cut the original string if you keep track of where the cut should occur:

    // Assuming that Java will always produce valid UTF8 from a string, so no error checking!
    // (Is this always true, I wonder?)
    public class UTF8Cutter {
      public static String cut(String s, int n) {
        byte[] utf8 = s.getBytes();
        if (utf8.length < n) n = utf8.length;
        int n16 = 0;
        int advance = 1;
        int i = 0;
        while (i < n) {
          advance = 1;
          if ((utf8[i] & 0x80) == 0) i += 1;
          else if ((utf8[i] & 0xE0) == 0xC0) i += 2;
          else if ((utf8[i] & 0xF0) == 0xE0) i += 3;
          else { i += 4; advance = 2; }
          if (i <= n) n16 += advance;
        }
        return s.substring(0,n16);
      }
    }
    

    Note: edited to fix bugs on 2014-08-25

提交回复
热议问题