We have a new requirement in our java application where user’s would upload an excel file. One of the column in the excel file will be formatted with bold, italics, bullet point
You are right in that the toString()
method will just return the unformatted String
contents of the HSSFRichTextString
.
Here is a method of extracting out all the other important data from the HSSFRichTextString
to be stored with the string value.
Very similar to my answer to this question, extract the rich text formatting information from the HSSFRichTextString
, and store that data in a class you'll create, FormattingRun
.
public class FormattingRun {
private int beginIdx;
private int length;
private short fontIdx;
public FormattingRun(int beginIdx, int length, short fontIdx) {
this.beginIdx = beginIdx;
this.length = length;
this.fontIdx = fontIdx;
}
public int getBegin() { return beginIdx; }
public int getLength() { return length; }
public short getFontIndex { return fontIdx; }
}
Then, call Apache POI methods to extract that data.
HSFFRichTextString
.short
font index present at the specified position in the stringNow, the actual extraction of the data:
List formattingRuns = new ArrayList();
int numFormattingRuns = richTextString.numFormattingRuns();
for (int fmtIdx = 0; fmtIdx < numFormattingRuns; fmtIdx)
{
int begin = richTextString.getIndexOfFormattingRun(fmtIdx);
short fontIndex = richTextString.getFontOfFormattingRun(fmtIdx);
// Walk the string to determine the length of the formatting run.
int length = 0;
for (int j = begin; j < richTextString.length(); j++)
{
short currFontIndex = richTextString.getFontAtIndex(j);
if (currFontIndex == fontIndex)
length++;
else
break;
}
formattingRuns.add(new FormattingRun(begin, length, fontIndex));
}
To store this data in the database, first recognize that there is a one-to-many relationship between a HSSFRichTextString
and FormattingRun
. So in whatever Oracle table you're planning on storing the rich text string data, you will need to create a foreign key relationship to another new table that stores the formatting run data. Something like this:
Table: rich_text_string
rts_id NUMBER
contents VARCHAR2(4000)
with rts_id
being the primary key, and:
Table: rts_formatting_runs
rts_id NUMBER
run_id NUMBER
run_pos NUMBER
run_len NUMBER
font_index NUMBER
with (rts_id, run_id)
being the primary key, and rts_id
referring back to the rich_text_string
table.
Using your favorite Java-to-database framework (JDBC, Hibernate, etc.), store the String
value into contents
in rich_text_string
, and the associated FormattingRun
object data into rt_formatting_runs
.
Just be careful - the font index is only valid within the workbook. You'll need to store the font information from the HSSFWorkbook
also, to give the font_index
meaning.
It's not stored as a CLOB
, but the data are arguably more meaningful stored this way.