XSSFCell in Apache POI encodes certain character sequences as unicode character

最后都变了- 提交于 2019-12-11 07:35:37

问题


XSSFCell seems to encode certain character sequences as unicode characters. How can I prevent this? Do I need to apply some kind of character escaping?

e.g.

cell.setCellValue("LUS_BO_WP_x24B8_AI"); // The cell value now is „LUS_BO_WPⒸAI"

In Unicode is U+24B8

I've already tried setting an ANSI font and setting the cell type to string.


回答1:


This character conversion is done in XSSFRichTextString.utfDecode()

I have now written a function that basicaly does the same thing in reverse.

private static final Pattern utfPtrn = Pattern.compile("_(x[0-9A-F]{4}_)");

private static final String UNICODE_CHARACTER_LOW_LINE = "_x005F_";

public static String escape(final String value) {
    if(value == null) return null;

    StringBuffer buf = new StringBuffer();
    Matcher m = utfPtrn.matcher(value);
    int idx = 0;
    while(m.find()) {
        int pos = m.start();
        if( pos > idx) {
            buf.append(value.substring(idx, pos));
        }

        buf.append(UNICODE_CHARACTER_LOW_LINE + m.group(1));

        idx = m.end();
    }
    buf.append(value.substring(idx));
    return buf.toString();
}


来源:https://stackoverflow.com/questions/48222502/xssfcell-in-apache-poi-encodes-certain-character-sequences-as-unicode-character

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!