Having trouble using Python and LibreOffice to convert pdf to docx and doc to docx

笑着哭i 提交于 2019-12-23 03:19:09

问题


I have spent a good amount of time trying to determine what is going wrong exactly, with the code I am using to convert pdf to docx (and doc to docx) using LibreOffice.

I have used both the windows run interface to test-run some of the code I have found to be relevant, and have tried on python as well, neither of which works.

I have LibreOffice v6.0.2 installed on windows. I have been using variations of this code to attempt to convert some pdfs to docx of which the specific pdf file is not really relevant:

    import subprocess
    lowriter='C://Program Files/LibreOffice/program/swriter.exe'
    subprocess.run('{} --invisible --convert-to docx --outdir "{}" "{}"'
                   .format(lowriter,'dir',

    'filepath.pdf',),shell=True)

I hvae tried code, again, in both the run interface on the windows os, and through python using the above code, with no luck. I have tried without the outdir as well, just in case I was writing that incorrectly, but always get a return code of 1:

    CompletedProcess(args='C://Program Files/LibreOffice/program/swriter.exe 
    --invisible --convert-to docx --outdir "{dir}" 
    {filepath.pdf}"', returncode=1)

The dir and filepath.pdf are place holders I have put.

I have a similar problem with the doc to docx conversion.


回答1:


There are a number of problems here. You should first get the --convert-to call to work from the command line as @CristiFati commented, and then implement in python.

Here is the code that works on my system. No // in the path, and quotes are needed. Also, the folder is LibreOffice 5 on my system.

import subprocess
lowriter = 'C:/Program Files (x86)/LibreOffice 5/program/swriter.exe'
subprocess.run(
    '"{}" --convert-to docx --outdir "{}" "{}"'
    .format(lowriter,'dir', 'filepath.doc',), shell=True)

Finally, it looks like converting from PDF to DOCX is not supported. LibreOffice Draw can open a PDF file and save as ODG format.

EDIT:

Here is working code to convert from PDF. I upgraded to LO 6, so the version number ("LibreOffice 5") is no longer required in the path.

import subprocess
loffice = 'C:/Program Files/LibreOffice/program/soffice.exe'
subprocess.run(
    '"{}" --convert-to odg --outdir "{}" "{}"'
    .format(loffice,'dir', 'filepath.pdf',), shell=True)



来源:https://stackoverflow.com/questions/49739245/having-trouble-using-python-and-libreoffice-to-convert-pdf-to-docx-and-doc-to-do

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!