pypdf

PDF bleed detection

瘦欲@ 提交于 2019-12-17 06:55:09
问题 I'm currently writing a little tool (Python + pyPdf) to test PDFs for printer conformity. Alas I already get confused at the first task: Detecting if the PDF has at least 3mm 'bleed' (border around the pages where nothing is printed). I already got that I can't detect the bleed for the complete document, since there doesn't seem to be a global one. On the pages however I can detect a total of five different boxes: mediaBox bleedBox trimBox cropBox artBox I read the pyPdf documentation

PYPDF watermarking returns error

拥有回忆 提交于 2019-12-13 04:51:56
问题 hi im trying to watermark a pdf fileusing pypdf2 though i get this error i cant figure out what goes wrong. i get the following error: Traceback (most recent call last): File "test.py", line 13, in <module> page.mergePage(watermark.getPage(0)) File "C:\Python27\site-packages\PyPDF2\pdf.py", line 1594, in mergePage self._mergePage(page2) File "C:\Python27\site-packages\PyPDF2\pdf.py", line 1651, in _mergePage page2Content, rename, self.pdf) File "C:Python27\site-packages\PyPDF2\pdf.py", line

How to install a module for python 2.6 on CentOS?

孤街浪徒 提交于 2019-12-12 14:02:43
问题 After I install python 2.6 on CentOS by: wget http://download.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm sudo rpm -ivh epel-release-5-4.noarch.rpm yum install python26 Then I install pyPdf by: yum install pyPdf However, the pyPdf is only available to the old python 2.4: # python Python 2.4.3 (#1, Jan 9 2013, 06:49:54) [GCC 4.1.2 20080704 (Red Hat 4.1.2-54)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import pyPdf >>> import sys >>>

PyPDF Merge and Write issue

这一生的挚爱 提交于 2019-12-12 11:34:23
问题 I am getting an unexpected error when using this. The first section is from a script that I found online, and I am trying to use it to pull a particular section identified in the PDF's outline. Everything works fine, except right at output.write(outputfile1) it says: PdfReadError: multiple definitions in dictionary. Anybody else run into this? Please forgive all the unnecessary print s at the end. :) import pyPdf import glob class Darrell(pyPdf.PdfFileReader): def getDestinationPageNumbers

Using PyPDF2 to merge files into multiple output files

岁酱吖の 提交于 2019-12-12 03:09:38
问题 Here is the code block that is causing the issue. The loop will append the new file each time, which is not what I am trying to accomplish. For example, outputfile1 is input1.pdf, outputfile2 is input1.pdf + input2.pdf... I am trying to merge file 1x.pdf with files 1a.pdf + 1b.pdf + 1c.pdf into the output file1.pdf and then loop through and do the same thing for 2, 3, and 4. The end result should be 4 separate files. What am I missing? Clear as mud? Thanks in advance for any assistance. i = 1

Center the page number in a footer using PyPDF

冷暖自知 提交于 2019-12-11 16:39:34
问题 I'm using PyPDF to create a formatted report. I want the page number (e.g. Page 1 of 3) to be centered in the footer, pretty much exactly how the PyPDF tutorial shows. Here's the tutorial I'm referencing. Below is the code I put in the footer method: def footer(self): genDateTime = "Report generated on: " + datetime.datetime.now().strftime('%m/%d/%Y %I:%M:%S %p') page = 'Page ' + str(self.page_no()) + '/{nb}' self.set_y(-10) self.set_font('Arial', '', 9) self.cell(0, 5, "Clinical Report:

Python PyPDF2 join pages

若如初见. 提交于 2019-12-11 16:38:22
问题 I have a PDF with a big table splitted in pages, so I need to join the per-page tables into a big table in a large page. Is this possible with PyPDF2 or another library? Cheers 回答1: Just working on something similar, it takes an input pdf and via a config file you can set the final pattern of single pages. Implementation with PyPDF2 but it still has issues with some pdf-files (have to dig deeper). https://github.com/Lageos/pdf-stitcher In principle adding a page right to another one works

Cannot import 'PyPDF2' in Python 3.7

萝らか妹 提交于 2019-12-11 14:36:54
问题 I am wondering, why, for the life of me I cannot import and use PyPDF2 (PDF library) in Python 3.7. Firstly, my import fails at top of main.py (i.e. below) from PyPDF2 import PdfFileReader Then I have tried pip install PyPDF2 and variants pip2 install PyPDF3 etc etc. All lead to the below output: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify

pypdf not extracting tables from pdf

别说谁变了你拦得住时间么 提交于 2019-12-11 07:46:49
问题 I am using pypdf to extract text from pdf files . The problem is that the tables in the pdf files are not extracted. I have also tried using the pdfminer but i am having the same issue . 回答1: The problem is that tables in PDFs are generally made up of absolutely positioned lines and characters, and it is non-trivial to convert this into a sensible table representation. In Python, PDFMiner is probably your best bet. It gives you a tree structure of layout objects, but you will have to do the

pyPDF2 TypeError when trying to extract text

心不动则不痛 提交于 2019-12-11 03:05:15
问题 I have successfully installed pyPDF, but the extractText method does not work well, so i decided to try pyPDF2, the problem is, when extracting text there is an exception: Traceback (most recent call last): File "C:\Users\Asus\Desktop\pfdtest.py", line 44, in <module> test2() File "C:\Users\Asus\Desktop\pfdtest.py", line 41, in test2 print(mypdf.getPage(0).extractText()) File "C:\Python32\lib\site-packages\PyPDF2\pdf.py", line 1701, in extractText content = ContentStream(content, self.pdf)