pypdf2

Can this fillable PDF be automated?

蹲街弑〆低调 提交于 2020-07-10 10:25:23
问题 The bounty expires in 6 days . Answers to this question are eligible for a +50 reputation bounty. Derik81 wants to draw more attention to this question: Just help me get all the fields to get populated with data. Please check out this PDF. It is a fillable PDF form and I wanted to know if there is any way that this pdf can be auto filled, if I have the data to be filled in each box in excel format. I know most of the PDF are in binary format but is that any way to know what is the ID of each

Extract text from pdf converted from webpage using Pypdf2

狂风中的少年 提交于 2020-06-29 04:34:38
问题 I used chrome to convert a webpage into Pdf using save as pdf option. Now the problem is that when I extract the data from it using PyPDF2, it shows Null whereas it works on other pdf files easily. I know that I can extract data directly from the website but I want to understand why this is not working. It shows the correct number of pages but when I extracttext(), it shows nothing. Does anyone know what is the problem? The link to the page is https://en.wikipedia.org/wiki/Rapping. I

Python/PyPDF4: How do I specify the /PageLabels in the created PDF?

∥☆過路亽.° 提交于 2020-06-16 04:55:43
问题 I am using PyPDF4 to create an offline-readable version of the journal "Nature". I use PyPDF4 PdfFileReader to read the individual article PDFs and PdfFileWriter to create a single, merged ouput. The problem that I am trying to solve is that the page numbers of some issues do not start at 1, for example, issue 7805 starts with page 563. How do I specify the desired /PageLabels in the document catalog? for pdf_file in pdf_files: input_pdf = PdfFileReader(open(pdf_file, 'rb')) page_indices =

Merging PDFs while retaining custom page numbers (aka pagelabels) and bookmarks

偶尔善良 提交于 2020-05-30 08:12:26
问题 I'm trying to automate merging several PDF files and have two requirements: a) existing bookmarks AND b) pagelabels (custom page numbering) need to be retained. Retaining bookmarks when merging happens by default with PyPDF2 and pdftk, but not with pdfrw. Pagelabels are consistently not retained in PyPDF2, pdftk or pdfrw. I am guessing, after having searched a lot, that there is no straightforward approach to doing what I want. If I'm wrong then I hope someone can point to this easy solution.

Merging PDFs while retaining custom page numbers (aka pagelabels) and bookmarks

妖精的绣舞 提交于 2020-05-30 08:12:07
问题 I'm trying to automate merging several PDF files and have two requirements: a) existing bookmarks AND b) pagelabels (custom page numbering) need to be retained. Retaining bookmarks when merging happens by default with PyPDF2 and pdftk, but not with pdfrw. Pagelabels are consistently not retained in PyPDF2, pdftk or pdfrw. I am guessing, after having searched a lot, that there is no straightforward approach to doing what I want. If I'm wrong then I hope someone can point to this easy solution.

How to extract text from pdf in python 3.7.3

痴心易碎 提交于 2020-05-25 08:19:32
问题 I am trying to extract text from a PDF file using Python. My main goal is I am trying to create a program that reads a bank statement and extracts its text to update an excel file to easily record monthly spendings. Right now I am focusing just extracting the text from the pdf file but I don't know how to do so. What is currently the best and easiest way to extract text from a PDF file into a string? What library is best to use today and how can I do it? I have tried using PyPDF2 but

How to extract text from pdf in python 3.7.3

一世执手 提交于 2020-05-25 08:18:17
问题 I am trying to extract text from a PDF file using Python. My main goal is I am trying to create a program that reads a bank statement and extracts its text to update an excel file to easily record monthly spendings. Right now I am focusing just extracting the text from the pdf file but I don't know how to do so. What is currently the best and easiest way to extract text from a PDF file into a string? What library is best to use today and how can I do it? I have tried using PyPDF2 but

How to erase text from PDF using Python

血红的双手。 提交于 2020-03-22 04:29:52
问题 I'm creating a python script to edit text from PDFs. I have this Python code which allows me to add text into specific positions of a PDF file. import PyPDF2 import io from reportlab.pdfgen import canvas from reportlab.lib.pagesizes import letter import sys packet = io.BytesIO() # create a new PDF with Reportlab can = canvas.Canvas(packet, pagesize=letter) # Insert code into specific position can.drawString(300, 115, "Hello world") can.save() #move to the beginning of the StringIO buffer

How to erase text from PDF using Python

守給你的承諾、 提交于 2020-03-22 04:27:51
问题 I'm creating a python script to edit text from PDFs. I have this Python code which allows me to add text into specific positions of a PDF file. import PyPDF2 import io from reportlab.pdfgen import canvas from reportlab.lib.pagesizes import letter import sys packet = io.BytesIO() # create a new PDF with Reportlab can = canvas.Canvas(packet, pagesize=letter) # Insert code into specific position can.drawString(300, 115, "Hello world") can.save() #move to the beginning of the StringIO buffer

How can I rotate a page in pyPDF2?

Deadly 提交于 2020-02-03 09:28:09
问题 I'm editing a PDF file with pyPDF2. I managed to generate the PDF I want but I've yet to rotate some pages. I went to the documentation and found two methods: rotateClockwise and rotateCounterClockwise , and while they say the parameter is an int , I can't make it work. Python says: TypeError: unsupported operand type(s) for +: 'IndirectObject' and 'int' To produce this error: page = input1.getPage(i) page.rotateCounterClockwise(90) output.addPage(page) I can't find someone explaining the