问题
I have a pdf file and i want to replace some text in pdf file and generate new pdf. How can i do that in python? I have tried reportlab , reportlab does not have any fucntion to search text and replace it. What other module can i use?
回答1:
You can try Aspose.PDF Cloud SDK for Python, Aspose.PDF Cloud is a REST API PDF Processing solution. It is paid API and its free package plan provides 50 credits per month.
I'm developer evangelist at Aspose.
import os
import asposepdfcloud
from asposepdfcloud.apis.pdf_api import PdfApi
# Get App key and App SID from https://cloud.aspose.com
pdf_api_client = asposepdfcloud.api_client.ApiClient(
app_key='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
app_sid='xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxxx')
pdf_api = PdfApi(pdf_api_client)
filename = '02_pages.pdf'
remote_name = '02_pages.pdf'
copied_file= '02_pages_new.pdf'
#upload PDF file to storage
pdf_api.upload_file(remote_name,filename)
#upload PDF file to storage
pdf_api.copy_file(remote_name,copied_file)
#Replace Text
text_replace = asposepdfcloud.models.TextReplace(old_value='origami',new_value='polygami',regex='true')
text_replace_list = asposepdfcloud.models.TextReplaceListRequest(text_replaces=[text_replace])
response = pdf_api.post_document_text_replace(copied_file, text_replace_list)
print(response)
回答2:
Have a look in THIS thread for one of the many ways to read text from a PDF. Then you'll need to create a new pdf, as they will, as far as I know, not retrieve any formatting for you.
回答3:
The CAM::PDF Perl Library can output text that's not too hard to parse (it seems to fairly randomly split lines of text). I couldn't be bothered to learn too much Perl, so I wrote these really basic Perl command line scripts, one that reads a single page pdf to a text file perl read.pl pdfIn.pdf textOut.txt
and one that writes the text (that you can modify in the meantime) to a pdf perl write.pl pdfIn.pdf textIn.txt pdfOut.pdf
.
#!/usr/bin/perl
use Module::Load;
load "CAM::PDF";
$pdfIn = $ARGV[0];
$textOut = $ARGV[1];
$pdf = CAM::PDF->new($pdfIn);
$page = $pdf->getPageContent(1);
open(my $fh, '>', $textOut);
print $fh $page;
close $fh;
exit;
and
#!/usr/bin/perl
use Module::Load;
load "CAM::PDF";
$pdfIn = $ARGV[0];
$textIn = $ARGV[1];
$pdfOut = $ARGV[2];
$pdf = CAM::PDF->new($pdfIn);
my $page;
open(my $fh, '<', $textIn) or die "cannot open file $filename";
{
local $/;
$page = <$fh>;
}
close($fh);
$pdf->setPageContent(1, $page);
$pdf->cleanoutput($pdfOut);
exit;
You can call these with python either side of doing some regex etc stuff on the outputted text file.
If you're completely new to Perl (like I was), you need to make sure that Perl and CPAN are installed, then run sudo cpan
, then in the prompt install "CAM::PDF";
, this will install the required modules.
Also, I realise that I should probably be using stdout etc, but I was in a hurry :-)
Also also, any ideas what the format CAM-PDF outputs is? is there any doc for it?
来源:https://stackoverflow.com/questions/31703037/python-how-to-replace-text-in-pdf