pypdf2

How can I rotate a page in pyPDF2?

蓝咒 提交于 2020-02-03 09:27:32
问题 I'm editing a PDF file with pyPDF2. I managed to generate the PDF I want but I've yet to rotate some pages. I went to the documentation and found two methods: rotateClockwise and rotateCounterClockwise , and while they say the parameter is an int , I can't make it work. Python says: TypeError: unsupported operand type(s) for +: 'IndirectObject' and 'int' To produce this error: page = input1.getPage(i) page.rotateCounterClockwise(90) output.addPage(page) I can't find someone explaining the

Watermark Removal on PDF with PyPDF2

佐手、 提交于 2020-02-01 05:17:30
问题 This Section imports the necessary classes from the PyPDF2 library from PyPDF2 import PdfFileReader, PdfFileWriter from PyPDF2.pdf import ContentStream from PyPDF2.generic import TextStringObject, NameObject from PyPDF2.utils import b_ >The watermark says SAMPLE on it so I've tried different capitalization cases wm_text = 'Sample' replace_with = '' >I'm hoping to just replace the SAMPLE watermark with nothing so a space could suffice > Load PDF into pyPDF source = PdfFileReader(open('input

How to get PyPDF2 to extract text from multiple sequential pages - in range?

放肆的年华 提交于 2020-01-16 08:39:07
问题 I'm trying to get PyPDF2 to extract specific text throughout a document per the code below. It is pulling exactly what I need and eliminating the duplicates, but it is not getting me a list from each page, it seems to only be showing me the text from the last page. What am I doing wrong? #import PyPDF2 and set extracted text as the page_content variable import PyPDF2 pdf_file = open('enme2.pdf','rb') read_pdf = PyPDF2.PdfFileReader(pdf_file) number_of_pages = read_pdf.getNumPages() #for loop

Convert a PDF files to TXT files

好久不见. 提交于 2020-01-14 03:22:07
问题 I need a last touch from an expert !! I want to convert all pdf files in a directory to txt files. I wrote a code to create empty txt files having the same name as pdf files and a code to convert a single pdf to txt but I want to convert all files in the directory. please see the code below: PS : I Already tried with PDFminer, and every other package and it does not work import pandas as pd import os import PyPDF2 ###Create empty txt files Named as pdf files ########### path = '....\\PDF2Text

Appending pdf files based multilpe values in a dictionary key (or csv) results in too many pages

余生颓废 提交于 2020-01-07 04:25:08
问题 I am trying generate pdf files based on the county they fall in. If there is more than one pdf file per county then I need to append the files into a single file based on the county key. I can't seem to get the maps to append based on key. The final maps generated seem random and often have way too many files appended. I am pretty sure I am not grouping them correctly. I have read that multiple values in a key can result in showing up multiple times. Can someone please clue me in on how to

PyPDF2 compression

邮差的信 提交于 2020-01-04 15:32:01
问题 I am struggling to compress my merged pdf's using the PyPDF2 module. this is my attempt based on http://www.blog.pythonlibrary.org/2012/07/11/pypdf2-the-new-fork-of-pypdf/ import PyPDF2 path = open('path/to/hello.pdf', 'rb') path2 = open('path/to/another.pdf', 'rb') merger = PyPDF2.PdfFileMerger() merger.append(fileobj=path2) merger.append(fileobj=path) pdf.filters.compress(merger) merger.write(open("test_out2.pdf", 'wb')) The error I receive is TypeError: must be string or read-only buffer,

PyPDF2 compression

爷,独闯天下 提交于 2020-01-04 15:29:17
问题 I am struggling to compress my merged pdf's using the PyPDF2 module. this is my attempt based on http://www.blog.pythonlibrary.org/2012/07/11/pypdf2-the-new-fork-of-pypdf/ import PyPDF2 path = open('path/to/hello.pdf', 'rb') path2 = open('path/to/another.pdf', 'rb') merger = PyPDF2.PdfFileMerger() merger.append(fileobj=path2) merger.append(fileobj=path) pdf.filters.compress(merger) merger.write(open("test_out2.pdf", 'wb')) The error I receive is TypeError: must be string or read-only buffer,

Python PyPDF2 writer does not work with decryption

坚强是说给别人听的谎言 提交于 2019-12-24 10:59:59
问题 I wanted to decrypt a pdf file and write the first page into another file. Currently the code looks like this: reader = PdfFileReader(infile) if reader.isEncrypted: reader.decrypt('') writer = PdfFileWriter() writer.addPage(reader.getPage(0)) pageObject = reader.getPage(0) print 'First page of this file contains the following text:\n', pageObject.extractText() with open('output.pdf', 'wb') as outfile: writer.write(outfile) The print function did output the content of the first page, so I knew

Extracting text from PDF in Python

白昼怎懂夜的黑 提交于 2019-12-24 10:57:56
问题 I have a PDF full of quotes: https://www.pdf-archive.com/2017/03/22/test/ I can extract the text in python using the following code: import PyPDF2 pdfFileObj = open('example.pdf','rb') pdfReader = PyPDF2.PdfFileReader(pdfFileObj) pageObj = pdfReader.getPage(0) print (pageObj.extractText()) This returns all the quotes as one paragraph. Is it possible to 'split' the pdf by the horizontal separator and split it into quotes that way? 回答1: If you want to just extract the quotes from the pdf text

PyPDF2 hangs on processing

情到浓时终转凉″ 提交于 2019-12-24 07:13:52
问题 I'm processing multiple pdf files using PyPDF2 but my script hangs somewhere. All I can see in my console is some "startxref on same line as offset" which I'm correct is a warning so by right it should still go to the finally block and return an empty string. Am I doing something wrong? import PyPDF2 import sys import os def decode_pdf(src_filename): out_str="" try: f = open(str(src_filename), "rb") read_pdf = PyPDF2.PdfFileReader(f) number_of_pages = read_pdf.getNumPages() for i in range(0