How to parse UTF-8 characters in Excel files using POI

↘锁芯ラ 提交于 2019-12-05 05:08:50

I had the same problem while extracting Persian text from an Excel file. I was using Eclipse, and simply going to Project -> Properties and changing the "text file encoding" to UTF-8 solved the problem.

in POI you can use like this:

Workbook wb = new HSSFWorkbook();
Sheet sheet = wb.createSheet("new sheet");

// Create a row and put some cells in it. Rows are 0 based.
Row row = sheet.createRow(1);

// Create a new font and alter it.
Font font = wb.createFont();
font.setCharSet(FontCharset.ARABIC.getValue());
font.setFontHeightInPoints((short)24);
font.setFontName("B Nazanin");
font.setItalic(true);
font.setStrikeout(true);

// Fonts are set into a style so create a new one to use.
CellStyle style = wb.createCellStyle();
style.setFont(font);

// Create a cell and put a value in it.
Cell cell = row.createCell(1);
cell.setCellValue("سلام");
cell.setCellStyle(style);

// Write the output to a file
FileOutputStream fileOut = new FileOutputStream("workbook.xls");
wb.write(fileOut);
fileOut.close();

and can use another charset in FontCharset

Get bytes using UTF as follows

cell.getStringCellValue().getBytes(Charset.forName("UTF-8"));

The solution is simple, to read cell string values of any encoding (non English characters); just use the following method:

sValue = cell.getRichStringCellValue().getString();

instead of:

sValue = cell.getStringCellValue();

This applies to UTF-8 encoded characters like Chinese, Arabic or Japanese.

P.S if anybody is using the Command line utility nullpunkt/excel-to-json which utilize the "Apache POI" library, modify the file converter/ExcelToJsonConverter.java by replacing the occurrences of "getStringCellValue()" to avoid reading non-english characters as "???".

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!