How to know the Image or Picture Location while parsing MS Word Doc in java using apache poi

前端 未结 2 1919
暗喜
暗喜 2021-01-16 23:57
HWPFDocument wordDoc = new HWPFDocument(new FileInputStream(fileName));
List picturesList = wordDoc.getPicturesTable().getAllPictures();
2条回答
  •  鱼传尺愫
    2021-01-17 00:27

    You're getting at the pictures the wrong way, which is why you're not finding any positions!

    What you need to do is process each CharacterRun of the document in turn. Pass that to the PicturesTable, and check if the character run has a picture in. If it does, fetch back the picture from the table, and you know where in the document it belongs as you have the run it comes from

    At the simplest, it'd be something like:

    PicturesSource pictures = new PicturesSource(document);
    PicturesTable pictureTable = document.getPicturesTable();
    
    Range r = document.getRange();
    for(int i=0; i

    You can find a good example of doing this in the Apache Tika parser for Microsoft Word .doc, which is powered by Apache POI

提交回复
热议问题