pypdf2 | 易学教程

PyPDF2: can it update data stream?

阅读更多关于 PyPDF2: can it update data stream?

I need to get a polygon comment into a pdf and revise it's shape. I'm able to do so now by merging the pdf and a blank pdf with just the polygon, then I am able to update the vertices and the rect. However, the polygon shape still looks the old one when opening the new pdf, even though it will be refreshed after a few clicks on the shape. I need to have this fixed and found this is probably caused by the data stream in the annotation object, which seems to still contain the old polygon shape. But I cannot figure out how to overwrite that before saving the new pdf. I used code similar below to

Read all bookmarks from a PDF document and create a dictionary with PageNumber and Title of the bookmark

阅读更多关于 Read all bookmarks from a PDF document and create a dictionary with PageNumber and Title of the bookmark

问题 I a trying to read a PDF document using Python with PyPDF2 package. The objective is to read all the bookmarks in the pdf and construct a dictionary with page numbers of the bookmark as keys and titles of bookmarks as values. There is not much support on the internet on how to achieve it except for this article. The code posted in it doesn't work and i am not an expert in python to correct it. PyPDF2's reader object has a property named outlines which gives you a list of all bookmark objects

PyPDF2: can it update data stream?

阅读更多关于 PyPDF2: can it update data stream?

问题 I need to get a polygon comment into a pdf and revise it's shape. I'm able to do so now by merging the pdf and a blank pdf with just the polygon, then I am able to update the vertices and the rect. However, the polygon shape still looks the old one when opening the new pdf, even though it will be refreshed after a few clicks on the shape. I need to have this fixed and found this is probably caused by the data stream in the annotation object, which seems to still contain the old polygon shape.

How to append content to a PDF using PyPDF2 and preserve the past versions

阅读更多关于 How to append content to a PDF using PyPDF2 and preserve the past versions

问题 PDF Supports document versions. That means that the current document can be kept intact, and we can change the content and presentation of the document just adding info. That feature is specially useful to verify the look and integrity of the document in the past digital signatures. For a better understanding of what I mean, check this document Digital Signatures in a PDF - Adobe, in the Figure #5. I have seen a lot of documentation and samples from PyPDF2 and other python libraries that add

How to write table structure data in PDF file in python?

阅读更多关于 How to write table structure data in PDF file in python?

问题 +----+-----------------------------+ | id | name | +====+=============================+ | 47 | Some textjogjwojgopwgpowmok | +----+-----------------------------+ | 47 | Some textjogjwojgopwgpowmokg| +----+-----------------------------+ | 47 | Some textjogjwojgopwgpowmokg| +----+-----------------------------+ | 47 | Some textjogjwojgopwgpowmokg| +----+-----------------------------+ | 47 | Some textjogjwojgopwgpowmokg| +----+-----------------------------+ I want to write the above table in PDF

How to extract text from a Specific Area in a PDF using Python?

阅读更多关于 How to extract text from a Specific Area in a PDF using Python?

问题 I'm trying to extract Text from a PDF using Python, and I have successfully done so using PyPDF2 like this: import PyPDF2 pdfFileObj = open('path', 'rb') pdfReader = PyPDF2.PdfFileReader(pdfFileObj) pageObj = pdfReader.getPage(0) pageObj.extractText() This extracts all the Text from the Page, but I want to extract the text only from a Rectangular region of 3'x4' at the top-left part of the page. I Basically want to do something like :How-to extract text from a pdf doc within a specific

How to extract text from a Specific Area in a PDF using Python?

阅读更多关于 How to extract text from a Specific Area in a PDF using Python?

I'm trying to extract Text from a PDF using Python, and I have successfully done so using PyPDF2 like this: import PyPDF2 pdfFileObj = open('path', 'rb') pdfReader = PyPDF2.PdfFileReader(pdfFileObj) pageObj = pdfReader.getPage(0) pageObj.extractText() This extracts all the Text from the Page, but I want to extract the text only from a Rectangular region of 3'x4' at the top-left part of the page. I Basically want to do something like : How-to extract text from a pdf doc within a specific rectangular region? but in Python Can this be done by PyPDF2 or by any other Python Library? This is a

Change metadata of pdf file with pypdf2

阅读更多关于 Change metadata of pdf file with pypdf2

I want to add a metadata key-value pair to the metadata of a pdf file. I found a several years old answer, but I think this is way to complicated. I guess there is an easier way today: https://stackoverflow.com/a/3257340/633961 I am not married with pypdf2, if there is an easier way, then I go this way? Tarun Lalwani You can do that using pdfrw pip install pdfrw Then run from pdfrw import PdfReader, PdfWriter trailer = PdfReader("myfile.pdf") trailer.Info.WhoAmI = "Tarun Lalwani" PdfWriter("edited.pdf", trailer=trailer).write() And then check the PDF Custom Properties Cyril N. I was surprised

How to use PDFminer.six with python 3?

阅读更多关于 How to use PDFminer.six with python 3?

问题 I want to use pdfminer.six which is for python 3 to extract pdf. The problem is there is no good documentation at all and no source code example on how to use it. I have already tried some code from StackOverflow but did not work. My code is as below. from pdfminer.converter import TextConverter from pdfminer.layout import LAParams from pdfminer.pdfpage import PDFPage from io import StringIO def convert_pdf_to_txt(path): rsrcmgr = PDFResourceManager() retstr = StringIO() codec = 'utf-8'

pdf form filled with PyPDF2 does not show in print

阅读更多关于 pdf form filled with PyPDF2 does not show in print

问题 I need to fill pdf form in batch, so tried to write a python code to do it for me from a csv file. I used second answer in this question and it fills the forms fine, however when I open the filled forms the answers does not show unless the corresponding field is selected. Also the answers does not show when the form is printed. I looked into PyPDF2 documents to see if I can flatten the generated forms but this features has not been implemented yet even though has been asked for about a year