Apache POI - Read and store Rich Text content in DB

后端 未结 1 1034
我寻月下人不归
我寻月下人不归 2021-01-24 06:59

We have a new requirement in our java application where user’s would upload an excel file. One of the column in the excel file will be formatted with bold, italics, bullet point

1条回答
  •  醉梦人生
    2021-01-24 07:46

    You are right in that the toString() method will just return the unformatted String contents of the HSSFRichTextString.

    Here is a method of extracting out all the other important data from the HSSFRichTextString to be stored with the string value.

    Very similar to my answer to this question, extract the rich text formatting information from the HSSFRichTextString, and store that data in a class you'll create, FormattingRun.

    public class FormattingRun {
        private int beginIdx;
        private int length;
        private short fontIdx;
        public FormattingRun(int beginIdx, int length, short fontIdx) {
            this.beginIdx = beginIdx;
            this.length = length;
            this.fontIdx = fontIdx;
        }
        public int getBegin() { return beginIdx; }
        public int getLength() { return length; }
        public short getFontIndex { return fontIdx; }
    }
    

    Then, call Apache POI methods to extract that data.

    • numFormattingRuns() - Returns the number of formatting runs in the HSFFRichTextString.
    • getFontOfFormattingRun(int) - Returns the short font index present at the specified position in the string

    Now, the actual extraction of the data:

    List formattingRuns = new ArrayList();
    int numFormattingRuns = richTextString.numFormattingRuns();
    for (int fmtIdx = 0; fmtIdx < numFormattingRuns; fmtIdx)
    {
        int begin = richTextString.getIndexOfFormattingRun(fmtIdx);
        short fontIndex = richTextString.getFontOfFormattingRun(fmtIdx);
    
        // Walk the string to determine the length of the formatting run.
        int length = 0;
        for (int j = begin; j < richTextString.length(); j++)
        {
            short currFontIndex = richTextString.getFontAtIndex(j);
            if (currFontIndex == fontIndex)
                length++;
            else
                break;
        }
        formattingRuns.add(new FormattingRun(begin, length, fontIndex));
    }
    

    To store this data in the database, first recognize that there is a one-to-many relationship between a HSSFRichTextString and FormattingRun. So in whatever Oracle table you're planning on storing the rich text string data, you will need to create a foreign key relationship to another new table that stores the formatting run data. Something like this:

    Table: rich_text_string
    rts_id     NUMBER
    contents   VARCHAR2(4000)
    

    with rts_id being the primary key, and:

    Table: rts_formatting_runs
    rts_id     NUMBER
    run_id     NUMBER
    run_pos    NUMBER
    run_len    NUMBER
    font_index NUMBER
    

    with (rts_id, run_id) being the primary key, and rts_id referring back to the rich_text_string table.

    Using your favorite Java-to-database framework (JDBC, Hibernate, etc.), store the String value into contents in rich_text_string, and the associated FormattingRun object data into rt_formatting_runs.

    Just be careful - the font index is only valid within the workbook. You'll need to store the font information from the HSSFWorkbook also, to give the font_index meaning.

    It's not stored as a CLOB, but the data are arguably more meaningful stored this way.

    0 讨论(0)
提交回复
热议问题