I\'m trying to extract the color of a rectangle in a PDF with iText. The following is all what the PDF page have:
And this is the page content extracted with iT
Your code shows it, this is how you create the rectangle and add it:
PdfTemplate template = writer.getDirectContent().createTemplate(120, 80);
template.setColorFill(BaseColor.RED);
template.rectangle(0, 0, 120, 80);
template.fill();
writer.releaseTemplate(template);
table.addCell(Image.getInstance(template));
An iText PdfTemplate
generates a PDF form XObject. A form XObject in turn is a PDF content stream that is a self-contained description of any sequence of graphics objects (including path objects, text objects, and sampled images) (section 8.10.1 of ISO 32000-1), i.e. a separate stream of drawing instructions the content of which can be referenced from any other content stream.
In the case of your page content stream, this is the line where the form XObject is included:
q 1.13 0 0 1.13 229.77 695.69 cm /Xf1 Do Q
(The transformation matrix is manipulated to stretch by 1.13 and moved a bit, then the XObject Xf1 is drawn, then the transformation matrix is reset.)
The content stream of that XObject Xf1 is this:
1 0 0 rg
0 0 120 80 re
f
I.e. it sets the non-stroking color to RGB red, defines a 120x80 rectangle at the origin, and fills it.
This is the line code I'm using to get the page content:
String pageContent = new String(reader.getPageContent(1));
That line is not adequate for getting all the content details:
It only returns the immediate page content but not the detailed instructions from the form XObjects and patterns used in the immediate content. Quite often one finds PDFs whose immediate page contents only reference one or more form XObjects.
In spite of appearances the page content is of a binary nature, not a textual. As soon as fonts with non-standard encodings are used, PDF string contents are meaningless in your Java String or (depending on your standard encoding) even broken.
Instead one should use the iText parser framework, e.g. like this:
ExtRenderListener extRenderListener = new ExtRenderListener()
{
@Override
public void beginTextBlock() { }
@Override
public void renderText(TextRenderInfo renderInfo) { }
@Override
public void endTextBlock() { }
@Override
public void renderImage(ImageRenderInfo renderInfo) { }
@Override
public void modifyPath(PathConstructionRenderInfo renderInfo)
{
pathInfos.add(renderInfo);
}
@Override
public Path renderPath(PathPaintingRenderInfo renderInfo)
{
GraphicsState graphicsState;
try
{
graphicsState = getGraphicsState(renderInfo);
}
catch (NoSuchFieldException | SecurityException | IllegalArgumentException | IllegalAccessException e)
{
e.printStackTrace();
return null;
}
Matrix ctm = graphicsState.getCtm();
if ((renderInfo.getOperation() & PathPaintingRenderInfo.FILL) != 0)
{
System.out.printf("FILL (%s) ", toString(graphicsState.getFillColor()));
if ((renderInfo.getOperation() & PathPaintingRenderInfo.STROKE) != 0)
System.out.print("and ");
}
if ((renderInfo.getOperation() & PathPaintingRenderInfo.STROKE) != 0)
{
System.out.printf("STROKE (%s) ", toString(graphicsState.getStrokeColor()));
}
System.out.print("the path ");
for (PathConstructionRenderInfo pathConstructionRenderInfo : pathInfos)
{
switch (pathConstructionRenderInfo.getOperation())
{
case PathConstructionRenderInfo.MOVETO:
System.out.printf("move to %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
break;
case PathConstructionRenderInfo.CLOSE:
System.out.printf("close %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
break;
case PathConstructionRenderInfo.CURVE_123:
System.out.printf("curve123 %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
break;
case PathConstructionRenderInfo.CURVE_13:
System.out.printf("curve13 %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
break;
case PathConstructionRenderInfo.CURVE_23:
System.out.printf("curve23 %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
break;
case PathConstructionRenderInfo.LINETO:
System.out.printf("line to %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
break;
case PathConstructionRenderInfo.RECT:
System.out.printf("rectangle %s ", transform(ctm, expandRectangleCoordinates(pathConstructionRenderInfo.getSegmentData())));
break;
}
}
System.out.println();
pathInfos.clear();
return null;
}
@Override
public void clipPath(int rule)
{
}
List transform(Matrix ctm, List coordinates)
{
List result = new ArrayList<>();
for (int i = 0; i + 1 < coordinates.size(); i += 2)
{
Vector vector = new Vector(coordinates.get(i), coordinates.get(i + 1), 1);
vector = vector.cross(ctm);
result.add(vector.get(Vector.I1));
result.add(vector.get(Vector.I2));
}
return result;
}
List expandRectangleCoordinates(List rectangle)
{
if (rectangle.size() < 4)
return Collections.emptyList();
return Arrays.asList(
rectangle.get(0), rectangle.get(1),
rectangle.get(0) + rectangle.get(2), rectangle.get(1),
rectangle.get(0) + rectangle.get(2), rectangle.get(1) + rectangle.get(3),
rectangle.get(0), rectangle.get(1) + rectangle.get(3)
);
}
String toString(BaseColor baseColor)
{
if (baseColor == null)
return "DEFAULT";
return String.format("%s,%s,%s", baseColor.getRed(), baseColor.getGreen(), baseColor.getBlue());
}
GraphicsState getGraphicsState(PathPaintingRenderInfo renderInfo) throws NoSuchFieldException, SecurityException, IllegalArgumentException, IllegalAccessException
{
Field gsField = PathPaintingRenderInfo.class.getDeclaredField("gs");
gsField.setAccessible(true);
return (GraphicsState) gsField.get(renderInfo);
}
final List pathInfos = new ArrayList<>();
};
try ( InputStream resource = [RETRIEVE FILE TO PARSE AS INPUT STREAM])
{
PdfReader pdfReader = new PdfReader(resource);
for (int page = 1; page <= pdfReader.getNumberOfPages(); page++)
{
System.out.printf("\nPage %s\n====\n", page);
PdfReaderContentParser parser = new PdfReaderContentParser(pdfReader);
parser.processContent(page, extRenderListener);
}
}
(ExtractPaths test method testExtractFromTestCreation
)
For your sample file this results in the output
Page 1
====
STROKE (0,0,0) the path rectangle [88.3, 693.69, 227.77, 693.69, 227.77, 788.0, 88.3, 788.0]
STROKE (0,0,0) the path rectangle [227.77, 693.69, 367.24, 693.69, 367.24, 788.0, 227.77, 788.0]
STROKE (0,0,0) the path rectangle [367.23, 693.69, 506.7, 693.69, 506.7, 788.0, 367.23, 788.0]
FILL (255,0,0) the path rectangle [229.77, 695.69, 365.37, 695.69, 365.37, 786.09, 229.77, 786.09]
STROKE (DEFAULT) the path move to [228.0, 810.0] line to [338.0, 810.0]
iText represents color values as bytes (0-255) instead of as the unit range (0.0 - 1.0) the PDF uses. Thus, you see '(255,0,0)' where the PDF selected '1 0 0 rg'.