pypdf2

Watermark two pdfs - Each page of the first with each page of the second

风格不统一 提交于 2020-08-25 04:10:24
问题 I have two pdf files of the same length, let's say pdf1.pdf and pdf2.pdf. I'm trying to watermark each page of pdf1.pdf with pdf2.pdf (i.e., page 1 of pdf1.pdf with page 1 of pdf2.pdf, page 2 of pdf1.pdf with page 2 of pdf2.pdf ...). However, I'm really struggling with how to loop them around (I'm new to programming). For example, I tried this: import PyPDF2 from PyPDF2 import PdfFileMerger from PyPDF2 import PdfFileReader, PdfFileWriter output = PdfFileWriter() ipdf = PdfFileReader(open(

How to check if PDF is scanned image or contains text

霸气de小男生 提交于 2020-08-21 02:53:52
问题 I have a large number of files, some of them are scanned images into PDF and some are full/partial text PDF. Is there a way to check these files to ensure that we are only processing files which are scanned images and not those that are full/partial text PDF files? environment: PYTHON 3.6 回答1: The below code will work, to extract data text data from both searchable and non-searchable PDF's. import fitz text = "" path = "Your_scanned_or_partial_scanned.pdf" doc = fitz.open(path) for page in