Is there any way to convert Pdf file to Docx using python

前端 未结 2 1610
忘了有多久
忘了有多久 2021-01-17 05:28

I am wondering if there is a way in python (tool or function etc.) to convert my pdf file to doc or docx?

I am aware of online converters but I need this in Python c

相关标签:
2条回答
  • 2021-01-17 06:14

    If you happen to have MS Word, there is a really simple way to do this using COM. Here is a script I wrote that can convert pdf to docx by calling the Word application.

    import glob
    import win32com.client
    import os
    
    word = win32com.client.Dispatch("Word.Application")
    word.visible = 0
    
    pdfs_path = "" # folder where the .pdf files are stored
    for i, doc in enumerate(glob.iglob(pdfs_path+"*.pdf")):
        print(doc)
        filename = doc.split('\\')[-1]
        in_file = os.path.abspath(doc)
        print(in_file)
        wb = word.Documents.Open(in_file)
        out_file = os.path.abspath(reqs_path +filename[0:-4]+ ".docx".format(i))
        print("outfile\n",out_file)
        wb.SaveAs2(out_file, FileFormat=16) # file format for docx
        print("success...")
        wb.Close()
    
    word.Quit()
    
    0 讨论(0)
  • 2021-01-17 06:18

    If you have pdf with lot of pages..below code will work:

    import PyPDF2
    
        path="C:\\ .... "
        text=""
        pdf_file = open(path, 'rb')
        text =""
        read_pdf = PyPDF2.PdfFileReader(pdf_file)
        c = read_pdf.numPages
        for i in range(c):
             page = read_pdf.getPage(i)
             text+=(page.extractText())
    
    0 讨论(0)
提交回复
热议问题