How to extract the color of a rectangle in a PDF, with iText

前端 未结 2 1491
甜味超标
甜味超标 2021-01-21 09:53

I\'m trying to extract the color of a rectangle in a PDF with iText. The following is all what the PDF page have:

And this is the page content extracted with iT

相关标签:
2条回答
  • 2021-01-21 10:27

    Your code shows it, this is how you create the rectangle and add it:

    PdfTemplate template = writer.getDirectContent().createTemplate(120, 80);
    template.setColorFill(BaseColor.RED);
    template.rectangle(0, 0, 120, 80);
    template.fill();
    writer.releaseTemplate(template);
    table.addCell(Image.getInstance(template));
    

    An iText PdfTemplate generates a PDF form XObject. A form XObject in turn is a PDF content stream that is a self-contained description of any sequence of graphics objects (including path objects, text objects, and sampled images) (section 8.10.1 of ISO 32000-1), i.e. a separate stream of drawing instructions the content of which can be referenced from any other content stream.

    In the case of your page content stream, this is the line where the form XObject is included:

    q 1.13 0 0 1.13 229.77 695.69 cm /Xf1 Do Q
    

    (The transformation matrix is manipulated to stretch by 1.13 and moved a bit, then the XObject Xf1 is drawn, then the transformation matrix is reset.)

    The content stream of that XObject Xf1 is this:

    1 0 0 rg
    0 0 120 80 re
    f
    

    I.e. it sets the non-stroking color to RGB red, defines a 120x80 rectangle at the origin, and fills it.


    This is the line code I'm using to get the page content:

    String pageContent = new String(reader.getPageContent(1));
    

    That line is not adequate for getting all the content details:

    1. It only returns the immediate page content but not the detailed instructions from the form XObjects and patterns used in the immediate content. Quite often one finds PDFs whose immediate page contents only reference one or more form XObjects.

    2. In spite of appearances the page content is of a binary nature, not a textual. As soon as fonts with non-standard encodings are used, PDF string contents are meaningless in your Java String or (depending on your standard encoding) even broken.

    Instead one should use the iText parser framework, e.g. like this:

    ExtRenderListener extRenderListener = new ExtRenderListener()
    {
        @Override
        public void beginTextBlock()                        {   }
        @Override
        public void renderText(TextRenderInfo renderInfo)   {   }
        @Override
        public void endTextBlock()                          {   }
        @Override
        public void renderImage(ImageRenderInfo renderInfo) {   }
    
        @Override
        public void modifyPath(PathConstructionRenderInfo renderInfo)
        {
            pathInfos.add(renderInfo);
        }
    
        @Override
        public Path renderPath(PathPaintingRenderInfo renderInfo)
        {
            GraphicsState graphicsState;
            try
            {
                graphicsState = getGraphicsState(renderInfo);
            }
            catch (NoSuchFieldException | SecurityException | IllegalArgumentException | IllegalAccessException e)
            {
                e.printStackTrace();
                return null;
            }
    
            Matrix ctm = graphicsState.getCtm();
    
            if ((renderInfo.getOperation() & PathPaintingRenderInfo.FILL) != 0)
            {
                System.out.printf("FILL (%s) ", toString(graphicsState.getFillColor()));
                if ((renderInfo.getOperation() & PathPaintingRenderInfo.STROKE) != 0)
                    System.out.print("and ");
            }
            if ((renderInfo.getOperation() & PathPaintingRenderInfo.STROKE) != 0)
            {
                System.out.printf("STROKE (%s) ", toString(graphicsState.getStrokeColor()));
            }
    
            System.out.print("the path ");
    
            for (PathConstructionRenderInfo pathConstructionRenderInfo : pathInfos)
            {
                switch (pathConstructionRenderInfo.getOperation())
                {
                case PathConstructionRenderInfo.MOVETO:
                    System.out.printf("move to %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
                    break;
                case PathConstructionRenderInfo.CLOSE:
                    System.out.printf("close %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
                    break;
                case PathConstructionRenderInfo.CURVE_123:
                    System.out.printf("curve123 %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
                    break;
                case PathConstructionRenderInfo.CURVE_13:
                    System.out.printf("curve13 %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
                    break;
                case PathConstructionRenderInfo.CURVE_23:
                    System.out.printf("curve23 %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
                    break;
                case PathConstructionRenderInfo.LINETO:
                    System.out.printf("line to %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
                    break;
                case PathConstructionRenderInfo.RECT:
                    System.out.printf("rectangle %s ", transform(ctm, expandRectangleCoordinates(pathConstructionRenderInfo.getSegmentData())));
                    break;
                }
            }
            System.out.println();
    
            pathInfos.clear();
            return null;
        }
    
        @Override
        public void clipPath(int rule)
        {
        }
    
        List<Float> transform(Matrix ctm, List<Float> coordinates)
        {
            List<Float> result = new ArrayList<>();
            for (int i = 0; i + 1 < coordinates.size(); i += 2)
            {
                Vector vector = new Vector(coordinates.get(i), coordinates.get(i + 1), 1);
                vector = vector.cross(ctm);
                result.add(vector.get(Vector.I1));
                result.add(vector.get(Vector.I2));
            }
            return result;
        }
    
        List<Float> expandRectangleCoordinates(List<Float> rectangle)
        {
            if (rectangle.size() < 4)
                return Collections.emptyList();
            return Arrays.asList(
                    rectangle.get(0), rectangle.get(1),
                    rectangle.get(0) + rectangle.get(2), rectangle.get(1),
                    rectangle.get(0) + rectangle.get(2), rectangle.get(1) + rectangle.get(3),
                    rectangle.get(0), rectangle.get(1) + rectangle.get(3)
                    );
        }
    
        String toString(BaseColor baseColor)
        {
            if (baseColor == null)
                return "DEFAULT";
            return String.format("%s,%s,%s", baseColor.getRed(), baseColor.getGreen(), baseColor.getBlue());
        }
    
        GraphicsState getGraphicsState(PathPaintingRenderInfo renderInfo) throws NoSuchFieldException, SecurityException, IllegalArgumentException, IllegalAccessException
        {
            Field gsField = PathPaintingRenderInfo.class.getDeclaredField("gs");
            gsField.setAccessible(true);
            return (GraphicsState) gsField.get(renderInfo);
        }
        
        final List<PathConstructionRenderInfo> pathInfos = new ArrayList<>();
    };
    
    try (   InputStream resource = [RETRIEVE FILE TO PARSE AS INPUT STREAM])
    {
        PdfReader pdfReader = new PdfReader(resource);
    
        for (int page = 1; page <= pdfReader.getNumberOfPages(); page++)
        {
            System.out.printf("\nPage %s\n====\n", page);
    
            PdfReaderContentParser parser = new PdfReaderContentParser(pdfReader);
            parser.processContent(page, extRenderListener);
    
        }
    }
    

    (ExtractPaths test method testExtractFromTestCreation)

    For your sample file this results in the output

    Page 1
    ====
    STROKE (0,0,0) the path rectangle [88.3, 693.69, 227.77, 693.69, 227.77, 788.0, 88.3, 788.0] 
    STROKE (0,0,0) the path rectangle [227.77, 693.69, 367.24, 693.69, 367.24, 788.0, 227.77, 788.0] 
    STROKE (0,0,0) the path rectangle [367.23, 693.69, 506.7, 693.69, 506.7, 788.0, 367.23, 788.0] 
    FILL (255,0,0) the path rectangle [229.77, 695.69, 365.37, 695.69, 365.37, 786.09, 229.77, 786.09] 
    STROKE (DEFAULT) the path move to [228.0, 810.0] line to [338.0, 810.0] 
    

    iText represents color values as bytes (0-255) instead of as the unit range (0.0 - 1.0) the PDF uses. Thus, you see '(255,0,0)' where the PDF selected '1 0 0 rg'.

    0 讨论(0)
  • 2021-01-21 10:46

    To find the color of your rectangle, you may need to browse through the /Annots section of the PDF stream. Here, you are only exploring the /Contents, which doesn't include information such as color for the Rect entities.

    I hope it will help :)

    0 讨论(0)
提交回复
热议问题