Convert PDF to DOC (Python/Bash)

后端未结

关注

 4  707

I\'ve saw some pages that allow user to upload PDF and returns a DOC file, like PdfToWord

Is there any way to convert a P

相关标签:

4条回答一整个雨季 2020-12-01 06:10 You can use GroupDocs.Conversion Cloud SDK for python without installing any third-party tool or software. Sample Python code: # Import module import groupdocs_conversion_cloud # Get your app_sid and app_key at https://dashboard.groupdocs.cloud (free registration is required). app_sid = "xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx" app_key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # Create instance of the API convert_api = groupdocs_conversion_cloud.ConvertApi.from_keys(app_sid, app_key) file_api = groupdocs_conversion_cloud.FileApi.from_keys(app_sid, app_key) try: #upload soruce file to storage filename = 'Sample.pdf' remote_name = 'Sample.pdf' output_name= 'sample.docx' strformat='docx' request_upload = groupdocs_conversion_cloud.UploadFileRequest(remote_name,filename) response_upload = file_api.upload_file(request_upload) #Convert PDF to Word document settings = groupdocs_conversion_cloud.ConvertSettings() settings.file_path =remote_name settings.format = strformat settings.output_path = output_name loadOptions = groupdocs_conversion_cloud.PdfLoadOptions() loadOptions.hide_pdf_annotations = True loadOptions.remove_embedded_files = False loadOptions.flatten_all_fields = True settings.load_options = loadOptions convertOptions = groupdocs_conversion_cloud.DocxConvertOptions() convertOptions.from_page = 1 convertOptions.pages_count = 1 settings.convert_options = convertOptions . request = groupdocs_conversion_cloud.ConvertDocumentRequest(settings) response = convert_api.convert_document(request) print("Document converted successfully: " + str(response)) except groupdocs_conversion_cloud.ApiException as e: print("Exception when calling get_supported_conversion_types: {0}".format(e.message)) I'm developer evangelist at aspose. 0 讨论(0) 发布评论: 提交评论加载中... 不要未来只要你来 2020-12-01 06:11 If you want to convert PDF -> MS Word type file like docx, I came across this. Ahsin Shabbir wrote: import glob import win32com.client import os word = win32com.client.Dispatch("Word.Application") word.visible = 0 pdfs_path = "" # folder where the .pdf files are stored for i, doc in enumerate(glob.iglob(pdfs_path+"*.pdf")): print(doc) filename = doc.split('\\')[-1] in_file = os.path.abspath(doc) print(in_file) wb = word.Documents.Open(in_file) out_file = os.path.abspath(reqs_path +filename[0:-4]+ ".docx".format(i)) print("outfile\n",out_file) wb.SaveAs2(out_file, FileFormat=16) # file format for docx print("success...") wb.Close() word.Quit() This worked like a charm for me, converted 500 pages PDF with formatting and images. 0 讨论(0) 发布评论: 提交评论加载中... 你的背包 2020-12-01 06:18 This is difficult because PDFs are presentation oriented and word documents are content oriented. I have tested both and can recommend the following projects. PyPDF2 PDFMiner However, you are most definitely going to lose presentational aspects in the conversion. 0 讨论(0) 发布评论: 提交评论加载中... 囚心锁ツ 2020-12-01 06:35 If you have LibreOffice installed lowriter --invisible --convert-to doc '/your/file.pdf' If you want to use Python for this: import os import subprocess for top, dirs, files in os.walk('/my/pdf/folder'): for filename in files: if filename.endswith('.pdf'): abspath = os.path.join(top, filename) subprocess.call('lowriter --invisible --convert-to doc "{}"' .format(abspath), shell=True) 0 讨论(0) 发布评论: 提交评论加载中... 验证码看不清? 提交回复