pypdf

PyPdf: split each page in two, pad with blank space

青春壹個敷衍的年華 提交于 2019-12-02 07:42:04
问题 I have a PDF file (A4, portrait layout), each page of which I want to split in a half of height. The output document should also be A4 and portrait layout, but lower half of each page needs to be blank. I saw https://stackoverflow.com/a/15743413/822789 but did not understand how to add blank space with mediaBox. 回答1: I don't really know PyPDF2 all that well, but I am the author of pdfrw and if I understand your question, pdfrw can certainly do what you want quite easily. I need to document it

PyPDF2 won't import

狂风中的少年 提交于 2019-12-01 19:23:06
问题 Hi I'm just getting started with python and trying to get some requisite libraries installed. Using Python 3.4.1 on OS X. I have installed PyPDF2 (with supposed success), yet I cannot seem to use the tools: sh-3.2# port select --list python Available versions for python: none python25-apple python26 python26-apple python27-apple python34 (active) sh-3.2# pip install PyPDF2 Requirement already satisfied (use --upgrade to upgrade): PyPDF2 in /opt/local/Library/Frameworks/Python.framework

PyPDF2 won't import

只愿长相守 提交于 2019-12-01 18:59:54
Hi I'm just getting started with python and trying to get some requisite libraries installed. Using Python 3.4.1 on OS X. I have installed PyPDF2 (with supposed success), yet I cannot seem to use the tools: sh-3.2# port select --list python Available versions for python: none python25-apple python26 python26-apple python27-apple python34 (active) sh-3.2# pip install PyPDF2 Requirement already satisfied (use --upgrade to upgrade): PyPDF2 in /opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages Cleaning up... sh-3.2# ... import PyPDF2 Traceback (most recent call

How to close pyPDF “PdfFileReader” Class file handle

我怕爱的太早我们不能终老 提交于 2019-12-01 04:05:34
this should be very simple question, for which I couldn't find answer by Google search: How to close file handle opened by pyPDF "PdfFileReader" Class Here is snippet: import os.path from pyPdf import PdfFileReader fname = 'my.pdf' input = PdfFileReader(file(fname, "rb")) os.rename(fname, 'my_renamed.pdf') which raises error [32] Thanks The operating system is preventing a file from being re-named while something else has it open. This is a Good Thing (tm). Python's with statement will automatically close the file after you're done reading/manipulating it. with open(fname, "rb") as f: input =

Why my code not correctly split every page in a scanned pdf?

风格不统一 提交于 2019-11-30 15:10:18
问题 Update: Thanks to stardt whose script works! The pdf is a page of another one. I tried the script on the other one, and it also correctly spit each pdf page, but the order of page numbers is sometimes right and sometimes wrong. For example, in page 25-28 of the pdf file, the printed page numbers are 14, 15, 17, are 16. I was wondering why? The entire pdf can be downloaded from http://download304.mediafire.com/u6ewhjt77lzg/bgf8uzvxatckycn/3.pdf Original: I have a scanned pdf, where two paper

finding on which page a search string is located in a pdf document using python

我的未来我决定 提交于 2019-11-30 14:51:52
Which python packages can I use to find out out on which page a specific “search string” is located ? I looked into several python pdf packages but couldn't figure out which one I should use. PyPDF does not seem to have this functionality and PDFMiner seems to be an overkill for such simple task. Any advice ? More precise: I have several PDF documents and I would like to extract pages which are between a string “Begin” and a string “End” . I finally figured out that pyPDF can help. I am posting it in case it can help somebody else. (1) a function to locate the string def fnPDF_FindText(xFile,

Extract hyperlinks from PDF in Python

一笑奈何 提交于 2019-11-30 14:23:34
问题 I have a PDF document with a few hyperlinks in it, and I need to extract all the text from the pdf. I have used the PDFMiner library and code from http://www.endlesslycurious.com/2012/06/13/scraping-pdf-with-python/ to extract text. However, it does not extract the hyperlinks. For example, I have text that says Check this link out, with a link attached to it. I am able to extract the words Check this link out , but what I really need is the hyperlink itself, not the words. How do I go about

Why my code not correctly split every page in a scanned pdf?

不打扰是莪最后的温柔 提交于 2019-11-30 13:18:43
Update: Thanks to stardt whose script works! The pdf is a page of another one. I tried the script on the other one, and it also correctly spit each pdf page, but the order of page numbers is sometimes right and sometimes wrong. For example, in page 25-28 of the pdf file, the printed page numbers are 14, 15, 17, are 16. I was wondering why? The entire pdf can be downloaded from http://download304.mediafire.com/u6ewhjt77lzg/bgf8uzvxatckycn/3.pdf Original: I have a scanned pdf, where two paper pages sit side by side in a pdf page. I would like to split the pdf page into two, with the original

Cannot install PyPdf 2 module

无人久伴 提交于 2019-11-30 11:41:41
Trying to install PyPdf2 module, I downloaded the zip and unzipped it, I executed python setup.py build and python setup.py install , but it seems that it has not been installed , when I try to import it from a python script, it returns an ImportError : import pyPdf Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: No module named pyPdf Any help please. I'm using python 2.7 under windows XP. It appears the README file for PyPDF2 is incorrect. It suggests that import pyPdf should work, but it doesn't. This new module is imported as import PyPDF2 (as suggested

Change metadata of pdf file with pypdf

蓝咒 提交于 2019-11-30 07:10:53
问题 I'd like to create/modify the title of a pdf document using pypdf. It seems that the title is readonly. Is there a way to access this metadata r/w? If answer positive, a piece of code would be appreciated. Thanks 回答1: You can manipulate the title with pyPDF (sort of). I came across this post on the reportlab-users listing: http://two.pairlist.net/pipermail/reportlab-users/2009-November/009033.html You can also use pypdf. http://pybrary.net/pyPdf/ This won't let you edit the metadata per se,