I have been trying to find the efficient way to convert document e.g. doc, docx, ppt, pptx to pdf. So far i have tried docsplit and oowriter
, but both took > 10 seconds to complete the job on pptx file having size 1.7MB. Can any one suggest me a better way or suggestions to improve my approach?
What i have tried:
from subprocess import Popen, PIPE
import time
def convert(src, dst):
d = {'src': src, 'dst': dst}
commands = [
'/usr/bin/docsplit pdf --output %(dst)s %(src)s' % d,
'oowriter --headless -convert-to pdf:writer_pdf_Export %(dst)s %(src)s' % d,
]
for i in range(len(commands)):
command = commands[i]
st = time.time()
process = Popen(command, stdout=PIPE, stderr=PIPE, shell=True) # I am aware of consequences of using `shell=True`
out, err = process.communicate()
errcode = process.returncode
if errcode != 0:
raise Exception(err)
en = time.time() - st
print 'Command %s: Completed in %s seconds' % (str(i+1), str(round(en, 2)))
if __name__ == '__main__':
src = '/path/to/source/file/'
dst = '/path/to/destination/folder/'
convert(src, dst)
Output:
Command 1: Completed in 11.91 seconds
Command 2: Completed in 11.55 seconds
Environment:
- Linux - Ubuntu 12.04
- Python 2.7.3
More tools result:
- jodconverter took 11.32 seconds
Try calling unoconv from your Python code, it took 8 seconds on my local machine, I don't know if it's fast enough for you:
time unoconv 15.\ Text-Files.pptx
real 0m8.604s
Pandoc is a wonderful tool capable of doing what you'd like quickly. Since you're using Popen to effectively shell out the command for the tool, it doesn't matter what language the tool is written in (Pandoc is written in Haskell).
Unfortunately I don't have the time to do a full benchmark, but you may want to check out xtopdf, my Python toolkit for PDF creation. It doesn't do the full range of conversions you want, and some of the conversions have limitations, but it may be of use. xtopdf links:
Online presentation about xtopdf - a good summary of what it is, what it does, platforms, features, users, uses etc.: http://slid.es/vasudevram/xtopdf
xtopdf on Bitbucket: https://bitbucket.org/vasudevram/xtopdf
Many blog posts showing how to use xtopdf for various purpose, including many that show how to use it to convert different input formats to PDF: http://jugad2.blogspot.com/search/label/xtopdf
HTH, Vasudev Ram
For doc and docx (but not ppt/pptx), you could try our independent (but commercial) high fidelity rendering engine online at OnlineDemo/docx_to_pdf
By "high fidelity", I mean it is designed from the ground up to have the same line and paragraph breaks, tab stops etc etc as Microsoft Word.
来源:https://stackoverflow.com/questions/20891787/an-efficient-way-to-convert-document-to-pdf-format