pypdf

pyPDF merging and displaying as httpresponse through django

孤街醉人 提交于 2019-12-07 14:54:31
问题 I'm having trouble incorporating pyPDF logic to merge two pdf files into my django site. I have written code that works to merge files when run in a python file on the local server(but I need to explicitly identify which files to merge: from pyPdf import PdfFileReader, PdfFileWriter output = PdfFileWriter() input1 = PdfFileReader(file("abc_form0.pdf", "rb")) input2 = PdfFileReader(file("abc_form1.pdf", "rb")) total_pages = input1.getNumPages() total_pages1 = input2.getNumPages() for page in

Python, pyPdf, Adobe PDF OCR error: unsupported filter /lzwdecode

有些话、适合烂在心里 提交于 2019-12-07 10:03:57
问题 My stuff: python 2.6 64 bit (with pyPdf-1.13.win32.exe installed). Wing IDE. Windows 7 64 bit. I got the following error: NotImplementedError: unsupported filter /LZWDecode When I ran the following code: from pyPdf import PdfFileWriter, PdfFileReader import sys, os, pyPdf, re path = 'C:\\Users\\Homer\\Documents\\' # This is where I put my pdfs filelist = os.listdir(path) has_text_list = [] does_not_have_text_list = [] for pdf_name in filelist: pdf_file_with_directory = os.path.join(path, pdf

Merging two PDFs

给你一囗甜甜゛ 提交于 2019-12-07 05:35:32
问题 import PyPDF2 import glob import os from fpdf import FPDF import shutil class MyPDF(FPDF): # adding a footer, containing the page number def footer (self): self.set_y(-15) self.set_font("Arial", Style="I", size=8) pageNum = "page %s/{nb}" % self.page_no() self.cell(0,10, pageNum, align="C") if __name__ == "__main__": os.chdir("pathtolocation/docs/") # docs location os.system("libreoffice --headless --invisible --convert-to pdf *") # this converts everything to pdf for file in glob.glob("*"):

Dynamically generated PDF files working in most readers except Adobe Reader

China☆狼群 提交于 2019-12-06 12:35:21
I'm trying to dynamically generate PDFs from user input, where I basically print the user input and overlay it on an existing PDF that I did not create. It works, with one major exception. Adobe Reader doesn't read it properly, on Windows or on Linux. QuickOffice on my phone doesn't read it either. So I thought I'd trace the path of me creating the files - 1 - Original PDF of background PDF 1.2 made with Adobe Distiller with the LZW encoding. I didn't make this. 2 - PDF of background PDF 1.4 made with Ghostscript. I used pdf2ps then ps2pdf on the above to strip LZW so that the reportlab and

Pdf overlaying not working

走远了吗. 提交于 2019-12-06 04:49:20
I have been looking for a solution for this problem : I have two landscape-oriented A3 pdfs with images and I want to overlay them in a manner that the resulting pdf contains both images merged into one as if one of them was a watermark, but with the same density. Think of it as if about printing two different pdfs on one A3 sheet of paper, I want to get exactly that effect. In other words - just came up with a way to express it - I would like to overlay two pdfs and for the upper layer, make all the "white" area transparent. Basically, I just followed steps in any solution from this question:

pypdf python tool

核能气质少年 提交于 2019-12-06 04:35:10
Using pypdf python module how to read the following pdf file http://www.envis-icpe.com/pointcounterpointbook/Hindi_Book.pdf # -*- coding: utf-8 -*- from pyPdf import PdfFileWriter, PdfFileReader import pyPdf def getPDFContent(path): content = "" # Load PDF into pyPDF pdf = pyPdf.PdfFileReader(file(path, "rb")) # Iterate pages for i in range(0, pdf.getNumPages()): # Extract text from page and add to content content += pdf.getPage(i).extractText() + "\n" # Collapse whitespace content = " ".join(content.replace(u"\xa0", " ").strip().split()) return content print getPDFContent("/home/tom/Desktop

How to merge two landscape pdf pages using pyPdf

时光总嘲笑我的痴心妄想 提交于 2019-12-06 03:55:42
问题 I'm having trouble merging two PDF files with pyPdf. When I run the following code the the watermark (page1) looks fine, but the page2 has been rotated 90 degrees clockwise. Any ideas what's going on? from pyPdf import PdfFileWriter, PdfFileReader # PDF1: A4 Landscape page created in photoshop using PdfCreator, input1 = PdfFileReader(file("base.pdf", "rb")) page1 = input1.getPage(0) # PDF2: A4 Landscape page, text only, created using Pisa (www.xhtml2pdf.com) input2 = PdfFileReader(file("text

python and pyPdf - how to extract text from the pages so that there are spaces between lines

假如想象 提交于 2019-12-06 03:20:57
问题 currently, if I make a page object of a pdf page with pyPdf, and extractText(), what happens is that lines are concatenated together. For example, if line 1 of the page says "hello" and line 2 says "world" the resulting text returned from extractText() is "helloworld" instead of "hello world." Does anyone know how to fix this, or have suggestions for a work around? I really need the text to have spaces in between the lines because i'm doing text mining on this pdf text and not having spaces

Porting to Python3: PyPDF2 mergePage() gives TypeError

烂漫一生 提交于 2019-12-05 22:00:35
I'm using Python 3.4.2 and PyPDF2 1.24 (also using reportlab 3.1.44 in case that helps) on windows 7. I recently upgraded from Python 2.7 to 3.4, and am in the process of porting my code. This code is used to create a blank pdf page with links embedded in it (using reportlab) and merge it (using PyPDF2) with an existing pdf page. I had an issue with reportlab in that saving the canvas used StringIO which needed to be changed to BytesIO, but after doing that I ran into this error: Traceback (most recent call last): File "C:\cms_software\pdf_replica\builder.py", line 401, in merge_pdf_files

pyPDF merging and displaying as httpresponse through django

旧巷老猫 提交于 2019-12-05 19:05:18
I'm having trouble incorporating pyPDF logic to merge two pdf files into my django site. I have written code that works to merge files when run in a python file on the local server(but I need to explicitly identify which files to merge: from pyPdf import PdfFileReader, PdfFileWriter output = PdfFileWriter() input1 = PdfFileReader(file("abc_form0.pdf", "rb")) input2 = PdfFileReader(file("abc_form1.pdf", "rb")) total_pages = input1.getNumPages() total_pages1 = input2.getNumPages() for page in xrange(total_pages): output.addPage(input1.getPage(page)) for page in xrange(total_pages1): output