pypdf | 易学教程

PDF bleed detection

阅读更多关于 PDF bleed detection

问题 I'm currently writing a little tool (Python + pyPdf) to test PDFs for printer conformity. Alas I already get confused at the first task: Detecting if the PDF has at least 3mm 'bleed' (border around the pages where nothing is printed). I already got that I can't detect the bleed for the complete document, since there doesn't seem to be a global one. On the pages however I can detect a total of five different boxes: mediaBox bleedBox trimBox cropBox artBox I read the pyPdf documentation

PYPDF watermarking returns error

阅读更多关于 PYPDF watermarking returns error

问题 hi im trying to watermark a pdf fileusing pypdf2 though i get this error i cant figure out what goes wrong. i get the following error: Traceback (most recent call last): File "test.py", line 13, in <module> page.mergePage(watermark.getPage(0)) File "C:\Python27\site-packages\PyPDF2\pdf.py", line 1594, in mergePage self._mergePage(page2) File "C:\Python27\site-packages\PyPDF2\pdf.py", line 1651, in _mergePage page2Content, rename, self.pdf) File "C:Python27\site-packages\PyPDF2\pdf.py", line

How to install a module for python 2.6 on CentOS?

阅读更多关于 How to install a module for python 2.6 on CentOS?

问题 After I install python 2.6 on CentOS by: wget http://download.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm sudo rpm -ivh epel-release-5-4.noarch.rpm yum install python26 Then I install pyPdf by: yum install pyPdf However, the pyPdf is only available to the old python 2.4: # python Python 2.4.3 (#1, Jan 9 2013, 06:49:54) [GCC 4.1.2 20080704 (Red Hat 4.1.2-54)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import pyPdf >>> import sys >>>

PyPDF Merge and Write issue

阅读更多关于 PyPDF Merge and Write issue

问题 I am getting an unexpected error when using this. The first section is from a script that I found online, and I am trying to use it to pull a particular section identified in the PDF's outline. Everything works fine, except right at output.write(outputfile1) it says: PdfReadError: multiple definitions in dictionary. Anybody else run into this? Please forgive all the unnecessary print s at the end. :) import pyPdf import glob class Darrell(pyPdf.PdfFileReader): def getDestinationPageNumbers

Using PyPDF2 to merge files into multiple output files

阅读更多关于 Using PyPDF2 to merge files into multiple output files

问题 Here is the code block that is causing the issue. The loop will append the new file each time, which is not what I am trying to accomplish. For example, outputfile1 is input1.pdf, outputfile2 is input1.pdf + input2.pdf... I am trying to merge file 1x.pdf with files 1a.pdf + 1b.pdf + 1c.pdf into the output file1.pdf and then loop through and do the same thing for 2, 3, and 4. The end result should be 4 separate files. What am I missing? Clear as mud? Thanks in advance for any assistance. i = 1

Center the page number in a footer using PyPDF

阅读更多关于 Center the page number in a footer using PyPDF

问题 I'm using PyPDF to create a formatted report. I want the page number (e.g. Page 1 of 3) to be centered in the footer, pretty much exactly how the PyPDF tutorial shows. Here's the tutorial I'm referencing. Below is the code I put in the footer method: def footer(self): genDateTime = "Report generated on: " + datetime.datetime.now().strftime('%m/%d/%Y %I:%M:%S %p') page = 'Page ' + str(self.page_no()) + '/{nb}' self.set_y(-10) self.set_font('Arial', '', 9) self.cell(0, 5, "Clinical Report:

Python PyPDF2 join pages

阅读更多关于 Python PyPDF2 join pages

问题 I have a PDF with a big table splitted in pages, so I need to join the per-page tables into a big table in a large page. Is this possible with PyPDF2 or another library? Cheers 回答1: Just working on something similar, it takes an input pdf and via a config file you can set the final pattern of single pages. Implementation with PyPDF2 but it still has issues with some pdf-files (have to dig deeper). https://github.com/Lageos/pdf-stitcher In principle adding a page right to another one works

Cannot import 'PyPDF2' in Python 3.7

阅读更多关于 Cannot import 'PyPDF2' in Python 3.7

问题 I am wondering, why, for the life of me I cannot import and use PyPDF2 (PDF library) in Python 3.7. Firstly, my import fails at top of main.py (i.e. below) from PyPDF2 import PdfFileReader Then I have tried pip install PyPDF2 and variants pip2 install PyPDF3 etc etc. All lead to the below output: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify

pypdf not extracting tables from pdf

阅读更多关于 pypdf not extracting tables from pdf

问题 I am using pypdf to extract text from pdf files . The problem is that the tables in the pdf files are not extracted. I have also tried using the pdfminer but i am having the same issue . 回答1: The problem is that tables in PDFs are generally made up of absolutely positioned lines and characters, and it is non-trivial to convert this into a sensible table representation. In Python, PDFMiner is probably your best bet. It gives you a tree structure of layout objects, but you will have to do the

pyPDF2 TypeError when trying to extract text

阅读更多关于 pyPDF2 TypeError when trying to extract text

问题 I have successfully installed pyPDF, but the extractText method does not work well, so i decided to try pyPDF2, the problem is, when extracting text there is an exception: Traceback (most recent call last): File "C:\Users\Asus\Desktop\pfdtest.py", line 44, in <module> test2() File "C:\Users\Asus\Desktop\pfdtest.py", line 41, in test2 print(mypdf.getPage(0).extractText()) File "C:\Python32\lib\site-packages\PyPDF2\pdf.py", line 1701, in extractText content = ContentStream(content, self.pdf)