问题
I have a word document. I need to match particular table section or heading section of it using GATE. I thought if there were any steps from where we can first check any font size or font style of the heading and then match rest of the content till next heading pattern repeats.
回答1:
GATE has only a limited support for MS Word documents provided by the Apache Tika and Apache POI libraries. I do not know about any free alternative... We have developed our own plugin (gate.DocumentFormat
) for this purpose in my company, but it is not available for the outside by now.
You can try to convert your word documents to HTML by some other tool (e.g. using directly the MS Word, OpenOffice, docx4j or others - try google docx to html
-- you will see many results) and then process the HTML documents in GATE instead. You will see all the formatting available in the Original markups
annotation set.
来源:https://stackoverflow.com/questions/33255580/parsing-either-font-style-or-block-of-paragraph-in-gate