pypdf

How to add page number to a pdf file?

余生长醉 提交于 2019-12-11 00:14:08
问题 I've been trying all morning to add page numbers to a pdf document, but I can't figure it out. I'd like to use python, with pyPdf or reportlab. Does anyone have any ideas? 回答1: Here is my Python code to Add Page Number to PDF file. I have used both pyPdf2 and reportlab. #!/usr/bin/env python3 # -*- coding: utf-8 -*- helpDoc = ''' Add Page Number to PDF file with Python Python 给 PDF 添加 页码 usage: python addPageNumberToPDF.py [PDF path] require: pip install reportlab pypdf2 Support both Python2

PdfFileReader: PdfReadError: Could not find xref table at specified location

空扰寡人 提交于 2019-12-10 20:13:00
问题 I am trying to read Pdf file in python through: from PyPDF2 import PdfFileReader, PdfFileWriter test_reader = PdfFileReader(file("test.pdf", "rb")) Above Line throws error: PyPDF2.utils.PdfReadError: Could not find xref table at specified location Any help will be highly appreciated 回答1: It's fixed. Actually, there wasn't any problem. Seems, the pdf I was using to test was corrupted one (even though when I opened it, the content was there, which is why I couldn't figure out at first place) I

problem with closing python pypdf - writing. getting a valueError: I/O operation on closed file

落花浮王杯 提交于 2019-12-10 15:59:27
问题 can't figure this up this function (part of class for scraping internet site into a pdf) supposed to merge the pdf file generated from web pages using pypdf. this is the method code: def mergePdf(self,mainname,inputlist=0): """merging the pdf pages getting an inputlist to merge or defaults to the class instance self.pdftomerge list""" from pyPdf import PdfFileWriter, PdfFileReader self._mergelist = inputlist or self.pdftomerge self.pdfoutput = PdfFileWriter() for name in self._mergelist:

Highlight text in a PDF with Python [closed]

倖福魔咒の 提交于 2019-12-09 04:53:18
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed last year . I'm working on custom search engine for my PDF data corpus. I have a transformation layer which is able to dump PDF content to text (using Apache Tika and GROBID). I have finished search layers and the view which return search results listing. Now, I'd like to add highlighting feature on original PDF for the lines

Are PDF box coordinates relative or absolute?

守給你的承諾、 提交于 2019-12-08 13:26:49
问题 I want to programmatically edit a PDF using pyPDF. Currently, I'm struggling with interpreting the various PDF boxes' (TrimBox, MediaBox etc.) dimensions. Each box has four dimensions stored as a four-tuple, e.g.: TrimBox: 56.69 56.69 1040.31 751.18 According to the PDF specification, these are supposed to describe a rectangle, and certainly (56.69, 56.69) determines the upper left corner of this rectangle. However, is (1040.31, 751.18) to be interpreted as the lower right corner of this

PyPdf Merge error

人走茶凉 提交于 2019-12-08 07:37:56
问题 When i merge several Pdf pages using PyPdf into one single page using mergeTranslatedPage , i got some unknown characters, these unknown squares are the characters not included in the last merged page, after some research i think that the method _merge_ressources not working very well , because the later page could overwrite the ressources of the older pages , i tried page1.compressContentStreams() after each merge but without a result. in this link you will see an example of the PDF that has

Getting TypeError: ord() expected string of length 1, but int found error

こ雲淡風輕ζ 提交于 2019-12-08 03:49:39
问题 Code is from PyPDF2 import PdfFileReader with open('HTTP_Book.pdf','rb') as file: pdf=PdfFileReader(file) pagedd=pdf.getPage(0) print(pagedd.extractText()) This code raises the error shown below: TypeError: ord() expected string of length 1, but int found I searched on internet and found this Troubleshooting "TypeError: ord() expected string of length 1, but int found" but it doesn't help much. I am aware of what is the background of this error but not sure how is it related here? Tried

Dynamically generated PDF files working in most readers except Adobe Reader

馋奶兔 提交于 2019-12-08 03:31:00
问题 I'm trying to dynamically generate PDFs from user input, where I basically print the user input and overlay it on an existing PDF that I did not create. It works, with one major exception. Adobe Reader doesn't read it properly, on Windows or on Linux. QuickOffice on my phone doesn't read it either. So I thought I'd trace the path of me creating the files - 1 - Original PDF of background PDF 1.2 made with Adobe Distiller with the LZW encoding. I didn't make this. 2 - PDF of background PDF 1.4

Pdf overlaying not working

こ雲淡風輕ζ 提交于 2019-12-07 22:29:10
问题 I have been looking for a solution for this problem : I have two landscape-oriented A3 pdfs with images and I want to overlay them in a manner that the resulting pdf contains both images merged into one as if one of them was a watermark, but with the same density. Think of it as if about printing two different pdfs on one A3 sheet of paper, I want to get exactly that effect. In other words - just came up with a way to express it - I would like to overlay two pdfs and for the upper layer, make

Porting to Python3: PyPDF2 mergePage() gives TypeError

↘锁芯ラ 提交于 2019-12-07 16:29:29
问题 I'm using Python 3.4.2 and PyPDF2 1.24 (also using reportlab 3.1.44 in case that helps) on windows 7. I recently upgraded from Python 2.7 to 3.4, and am in the process of porting my code. This code is used to create a blank pdf page with links embedded in it (using reportlab) and merge it (using PyPDF2) with an existing pdf page. I had an issue with reportlab in that saving the canvas used StringIO which needed to be changed to BytesIO, but after doing that I ran into this error: Traceback