HWPFDocument wordDoc = new HWPFDocument(new FileInputStream(fileName));
List
You Should add PicturesSourceClass
public class PicturesSource {
private PicturesTable picturesTable;
private Set<Picture> output = new HashSet<Picture>();
private Map<Integer, Picture> lookup;
private List<Picture> nonU1based;
private List<Picture> all;
private int pn = 0;
public PicturesSource(HWPFDocument doc) {
picturesTable = doc.getPicturesTable();
all = picturesTable.getAllPictures();
lookup = new HashMap<Integer, Picture>();
for (Picture p : all) {
lookup.put(p.getStartOffset(), p);
}
nonU1based = new ArrayList<Picture>();
nonU1based.addAll(all);
Range r = doc.getRange();
for (int i = 0; i < r.numCharacterRuns(); i++) {
CharacterRun cr = r.getCharacterRun(i);
if (picturesTable.hasPicture(cr)) {
Picture p = getFor(cr);
int at = nonU1based.indexOf(p);
nonU1based.set(at, null);
}
}
}
private boolean hasPicture(CharacterRun cr) {
return picturesTable.hasPicture(cr);
}
private void recordOutput(Picture picture) {
output.add(picture);
}
private boolean hasOutput(Picture picture) {
return output.contains(picture);
}
private int pictureNumber(Picture picture) {
return all.indexOf(picture) + 1;
}
public Picture getFor(CharacterRun cr) {
return lookup.get(cr.getPicOffset());
}
private Picture nextUnclaimed() {
Picture p = null;
while (pn < nonU1based.size()) {
p = nonU1based.get(pn);
pn++;
if (p != null) return p;
}
return null;
}
}
You're getting at the pictures the wrong way, which is why you're not finding any positions!
What you need to do is process each CharacterRun of the document in turn. Pass that to the PicturesTable, and check if the character run has a picture in. If it does, fetch back the picture from the table, and you know where in the document it belongs as you have the run it comes from
At the simplest, it'd be something like:
PicturesSource pictures = new PicturesSource(document);
PicturesTable pictureTable = document.getPicturesTable();
Range r = document.getRange();
for(int i=0; i<r.numParagraphs(); i++) {
Paragraph p = r.getParagraph(i);
for(int j=0; j<p.numCharacterRuns(); j++) {
CharacterRun cr = p.getCharacterRun(j);
if (pictureTable.hasPicture(cr)) {
Picture picture = pictures.getFor(cr);
// Do something useful with the picture
}
}
}
You can find a good example of doing this in the Apache Tika parser for Microsoft Word .doc, which is powered by Apache POI