itext how to check if giant string is present on the pdf page

感情迁移 提交于 2019-12-11 23:15:37

问题


-I am using the IText plugin to create/read pdfs on my java project. -I am reading multiple text files from any extension(pdf,doc,word etc) and writing their content on a new pdf(all the content of all the files joint together) -To separate each content of each file on the giant pdf, i am always starting a new page, writing the exact path to the file in red at the start of the new page and then writing the content of the file

The problem:

  • I want to write how many pages did the file have on this pdf
  • How do i check if a string is present on the pdf page? I have all the files paths, so i would like to check if any of the paths is written on the page
  • I was following this tutorial to extract the string of any of my pages: http://www.quicklyjava.com/read-pdf-file-in-java-using-itext/
  • But when i extract all the page and check if one if my file paths is present at the page(doing a string.contains(...)), the system doesn't find my file path on the pdf page! I have checked why this happens and when i outputted one page's string, it was like this:

    1. PdfGeneratorForSoftwareRegistration/PdfGeneratorForSoftwareRegistration/ src/br/ufrn/pairg/pdfgenerator/LeitorArquivoTexto.java package br.ufrn.pairg.pdfgenerator;

    import java.io.BufferedReader; import java.io.File; import java.io.FileReader; import java.io.IOException; import java.util.Scanner;

    public...

When i checked to see if the file path "PdfGeneratorForSoftwareRegistration/PdfGeneratorForSoftwareRegistration/ src/br/ufrn/pairg/pdfgenerator/LeitorArquivoTexto.java" was present at this giant string, the system didn't find it. Can you see the problem? My path is so big that occupies 2 lines! That's the problem!

So, my question is: is there a way to check if a giant string is present on a pdf text using itext plugin?


回答1:


Pages in a PDF file are organized using a page tree. Each leaf of the page tree is a page dictionary with keys and values. You could add a custom entry to the page dictionary like this:

public void createPdf(String dest) throws IOException, DocumentException {
    Document document = new Document();
    PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(dest));
    document.open();
    document.add(new Paragraph("Page 1"));
    document.newPage();
    document.add(new Paragraph("Page 2"));
    document.newPage();
    document.add(new Paragraph("Page 3"));
    document.newPage();
    document.add(new Paragraph("Page 4"));
    writer.addPageDictEntry(new PdfName("ITXT_PageMarker"), new PdfString("Marker for page 4"));
    document.newPage();
    document.add(new Paragraph("Page 5"));
    document.newPage();
    document.add(new Paragraph("Page 6"));
    writer.addPageDictEntry(new PdfName("ITXT_PageMarker"), new PdfName("PageMarker"));
    document.newPage();
    document.add(new Paragraph("Page 7"));
    writer.addPageDictEntry(new PdfName("ITXT_PageMarker"), new PdfNumber(7));
    document.newPage();
    document.add(new Paragraph("Page 8"));
    document.close();
}

If you look inside the PDF, this looks like this:

For the sake of this example, I added a PDF string for page 4, a PDF name for page 6 and a PDF number for page 7.

You can check for the presence of this custom key like this:

public void check(String filename) throws IOException {
    PdfReader reader = new PdfReader(filename);
    PdfDictionary pagedict;
    for (int i = 1; i < reader.getNumberOfPages(); i++) {
        pagedict = reader.getPageN(i);
        System.out.println(pagedict.get(new PdfName("ITXT_PageMarker")));
    }
    reader.close();
}

The output of this check() is like this:

null
null
null
Marker for page 4
null
/PageMarker
7

Important: You can't just invent new keys for the PDF syntax apart from those defined in ISO 32000. However, you can create your own custom keys if you register a 4 digit code with ISO. For instance: Adobe registered ADBE, iText registered ITXT,... If you introduce new custom keys, you should use the code registered with ISO as a prefix. For instance: at iText, we can use ITXT_PageMarker, or ITXT_custom, or ITXT_Whatever,... This rule avoids that two different company introduce the same code with a different meaning.




回答2:


It´s not the best sollution for it, but i solved it by writing an miraculous id(like "#%&#id_0#%&#") on top of every path name on my first pdf. Then, i read the pdf once again and check if there's the id. If there is, i associate it with my file paths.

Problem solved: i am getting the page numbers using the solution of http://www.quicklyjava.com/read-pdf-file-in-java-using-itext/

Problem: If there is any file in the project with #%&#id_0#%&#,#%&#id_1#%&#... written on it, my program will not work.



来源:https://stackoverflow.com/questions/32527211/itext-how-to-check-if-giant-string-is-present-on-the-pdf-page

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!