Loop over PDF files and transform them into doc with word

后端 未结 2 485
生来不讨喜
生来不讨喜 2021-01-17 17:02

I am trying to use VBA coding - which I am pretty new to - to obtain a series of .doc documents from PDFs (which are not images), that is, I am trying to loop over various P

2条回答
  •  小鲜肉
    小鲜肉 (楼主)
    2021-01-17 17:23

    Any language that can read PDF files and write Word docs (which are XML) can do this, but the conversion you like (which Word does when the PDF is opened) will require using an API for the application itself. VBA is your easy option.

    The snippets you've posted (and my samples below) use early binding and enumerated constants, which means we need a reference to the Word object library. That is already set up for any code you write in a Word document, so create a new Word document and add the code in a standard module. (See this Excel tutorial if you need more details, the steps for our process are the same).

    You can run your macro from the VB Editor (using the Run button) or from the normal document window (click the Macros button on the View tab in Word 2010-2016). Save your document as a DOCM file if you want to reuse the macro without setting up the code again.

    Now for the code!

    As stated in comments, your second snippet is valid if you just ensure that your folder paths end with a backslash "\" character. It's still not great code after you fix that, but that'll get you up and running.

    I'll assume you want to go the extra mile and have a well-written version of this you could repurpose or expand upon later. For simplicity, we'll use two procedures: the main conversion and a procedure to suppress the PDF conversion warning dialog (controlled by the registry).

    Main procedure:

    Sub ConvertPDFsToWord2()
        Dim path As String
        'Manually edit path in the next line before running
        path = "C:\users\username\work_dir_example\"
    
        Dim file As String
        Dim doc As Word.Document
        Dim regValPDF As Integer
        Dim originalAlertLevel As WdAlertLevel
    
    'Generate string for getting all PDFs with Dir command
        'Check for terminal \
        If Right(path, 1) <> "\" Then path = path & "\"
        'Append file type with wildcard
        file = path & "*.pdf"
    
        'Get path for first PDF (blank string if no PDFs exist)
        file = Dir(file)
    
        originalAlertLevel = Application.DisplayAlerts
        Application.DisplayAlerts = wdAlertsNone
    
        If file <> "" Then regValPDF = TogglePDFWarning(1)
    
        Do While file <> ""
            'Open method will automatically convert PDF for editing
            Set doc = Documents.Open(path & file, False)
    
            'Save and close document
            doc.SaveAs2 path & Replace(file, ".pdf", ".docx"), _
                        fileformat:=wdFormatDocumentDefault
            doc.Close False
    
            'Get path for next PDF (blank string if no PDFs remain)
            file = Dir
        Loop
    
    CleanUp:
        On Error Resume Next 'Ignore errors during cleanup
        doc.Close False
        'Restore registry value, if necessary
        If regValPDF <> 1 Then TogglePDFWarning regValPDF
        Application.DisplayAlerts = originalAlertLevel
    
    End Sub
    

    Registry setting function:

    Private Function TogglePDFWarning(newVal As Integer) As Integer
    'This function reads and writes the registry value that controls
    'the dialog displayed when Word opens (and converts) a PDF file
        Dim wShell As Object
        Dim regKey As String
        Dim regVal As Variant
    
        'setup shell object and string for key
        Set wShell = CreateObject("WScript.Shell")
        regKey = "HKCU\SOFTWARE\Microsoft\Office\" & _
                 Application.Version & "\Word\Options\"
    
        'Get existing registry value, if any
        On Error Resume Next 'Ignore error if reg value does not exist
        regVal = wShell.RegRead(regKey & "DisableConvertPdfWarning")
        On Error GoTo 0      'Break on errors after this point
    
        wShell.regwrite regKey & "DisableConvertPdfWarning", newVal, "REG_DWORD"
    
        'Return original setting / registry value (0 if omitted)
        If Err.Number <> 0 Or regVal = 0 Then
            TogglePDFWarning = 0
        Else
            TogglePDFWarning = 1
        End If
    
    End Function
    

提交回复
热议问题