Python: How to replace text in pdf

问题

I have a pdf file and i want to replace some text in pdf file and generate new pdf. How can i do that in python? I have tried reportlab , reportlab does not have any fucntion to search text and replace it. What other module can i use?

回答1:

You can try Aspose.PDF Cloud SDK for Python, Aspose.PDF Cloud is a REST API PDF Processing solution. It is paid API and its free package plan provides 50 credits per month.

I'm developer evangelist at Aspose.

import os
import asposepdfcloud
from asposepdfcloud.apis.pdf_api import PdfApi

# Get App key and App SID from https://cloud.aspose.com
pdf_api_client = asposepdfcloud.api_client.ApiClient(
    app_key='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
    app_sid='xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxxx')

pdf_api = PdfApi(pdf_api_client)
filename = '02_pages.pdf'
remote_name = '02_pages.pdf'
copied_file= '02_pages_new.pdf'
#upload PDF file to storage
pdf_api.upload_file(remote_name,filename)

#upload PDF file to storage
pdf_api.copy_file(remote_name,copied_file)

#Replace Text
text_replace = asposepdfcloud.models.TextReplace(old_value='origami',new_value='polygami',regex='true')
text_replace_list = asposepdfcloud.models.TextReplaceListRequest(text_replaces=[text_replace])

response = pdf_api.post_document_text_replace(copied_file, text_replace_list)
print(response)

回答2:

Have a look in THIS thread for one of the many ways to read text from a PDF. Then you'll need to create a new pdf, as they will, as far as I know, not retrieve any formatting for you.

回答3:

The CAM::PDF Perl Library can output text that's not too hard to parse (it seems to fairly randomly split lines of text). I couldn't be bothered to learn too much Perl, so I wrote these really basic Perl command line scripts, one that reads a single page pdf to a text file perl read.pl pdfIn.pdf textOut.txt and one that writes the text (that you can modify in the meantime) to a pdf perl write.pl pdfIn.pdf textIn.txt pdfOut.pdf.

#!/usr/bin/perl
use Module::Load;
load "CAM::PDF";

$pdfIn = $ARGV[0];
$textOut = $ARGV[1];

$pdf = CAM::PDF->new($pdfIn);
$page = $pdf->getPageContent(1);

open(my $fh, '>', $textOut);
print $fh $page;
close $fh;

exit;

and

#!/usr/bin/perl
use Module::Load;
load "CAM::PDF";

$pdfIn = $ARGV[0];
$textIn = $ARGV[1];
$pdfOut = $ARGV[2];

$pdf = CAM::PDF->new($pdfIn);

my $page;
   open(my $fh, '<', $textIn) or die "cannot open file $filename";
   {
       local $/;
       $page = <$fh>;
   }
close($fh);

$pdf->setPageContent(1, $page);

$pdf->cleanoutput($pdfOut);

exit;

You can call these with python either side of doing some regex etc stuff on the outputted text file.

If you're completely new to Perl (like I was), you need to make sure that Perl and CPAN are installed, then run sudo cpan, then in the prompt install "CAM::PDF";, this will install the required modules.

Also, I realise that I should probably be using stdout etc, but I was in a hurry :-)

Also also, any ideas what the format CAM-PDF outputs is? is there any doc for it?

来源：https://stackoverflow.com/questions/31703037/python-how-to-replace-text-in-pdf

标签

python

pdf

reportlab

pypdf