Convert PDF to DOC (Python/Bash)

后端 未结 4 707
傲寒
傲寒 2020-12-01 05:46

I\'ve saw some pages that allow user to upload PDF and returns a DOC file, like PdfToWord

Is there any way to convert a P

相关标签:
4条回答
  • 2020-12-01 06:10

    You can use GroupDocs.Conversion Cloud SDK for python without installing any third-party tool or software.

    Sample Python code:

    # Import module
    import groupdocs_conversion_cloud
    
    # Get your app_sid and app_key at https://dashboard.groupdocs.cloud (free registration is required).
    app_sid = "xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx"
    app_key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
    
    # Create instance of the API
    convert_api = groupdocs_conversion_cloud.ConvertApi.from_keys(app_sid, app_key)
    file_api = groupdocs_conversion_cloud.FileApi.from_keys(app_sid, app_key)
    
    try:
    
            #upload soruce file to storage
            filename = 'Sample.pdf'
            remote_name = 'Sample.pdf'
            output_name= 'sample.docx'
            strformat='docx'
    
            request_upload = groupdocs_conversion_cloud.UploadFileRequest(remote_name,filename)
            response_upload = file_api.upload_file(request_upload)
            #Convert PDF to Word document
            settings = groupdocs_conversion_cloud.ConvertSettings()
            settings.file_path =remote_name
            settings.format = strformat
            settings.output_path = output_name
    
            loadOptions = groupdocs_conversion_cloud.PdfLoadOptions()
            loadOptions.hide_pdf_annotations = True
            loadOptions.remove_embedded_files = False
            loadOptions.flatten_all_fields = True
    
            settings.load_options = loadOptions
    
            convertOptions = groupdocs_conversion_cloud.DocxConvertOptions()
            convertOptions.from_page = 1
            convertOptions.pages_count = 1
    
            settings.convert_options = convertOptions
     .               
            request = groupdocs_conversion_cloud.ConvertDocumentRequest(settings)
            response = convert_api.convert_document(request)
    
            print("Document converted successfully: " + str(response))
    except groupdocs_conversion_cloud.ApiException as e:
            print("Exception when calling get_supported_conversion_types: {0}".format(e.message))
    

    I'm developer evangelist at aspose.

    0 讨论(0)
  • If you want to convert PDF -> MS Word type file like docx, I came across this.

    Ahsin Shabbir wrote:

    import glob
    import win32com.client
    import os
    
    word = win32com.client.Dispatch("Word.Application")
    word.visible = 0
    
    pdfs_path = "" # folder where the .pdf files are stored
    for i, doc in enumerate(glob.iglob(pdfs_path+"*.pdf")):
        print(doc)
        filename = doc.split('\\')[-1]
        in_file = os.path.abspath(doc)
        print(in_file)
        wb = word.Documents.Open(in_file)
        out_file = os.path.abspath(reqs_path +filename[0:-4]+ ".docx".format(i))
        print("outfile\n",out_file)
        wb.SaveAs2(out_file, FileFormat=16) # file format for docx
        print("success...")
        wb.Close()
    
    word.Quit()
    

    This worked like a charm for me, converted 500 pages PDF with formatting and images.

    0 讨论(0)
  • 2020-12-01 06:18

    This is difficult because PDFs are presentation oriented and word documents are content oriented. I have tested both and can recommend the following projects.

    1. PyPDF2
    2. PDFMiner

    However, you are most definitely going to lose presentational aspects in the conversion.

    0 讨论(0)
  • 2020-12-01 06:35

    If you have LibreOffice installed

    lowriter --invisible --convert-to doc '/your/file.pdf'
    

    If you want to use Python for this:

    import os
    import subprocess
    
    for top, dirs, files in os.walk('/my/pdf/folder'):
        for filename in files:
            if filename.endswith('.pdf'):
                abspath = os.path.join(top, filename)
                subprocess.call('lowriter --invisible --convert-to doc "{}"'
                                .format(abspath), shell=True)
    
    0 讨论(0)
提交回复
热议问题