Copying data from multiple pdf files

后端 未结 6 1116
太阳男子
太阳男子 2021-01-14 19:51

I have pdf files from which I would like to copy all the data to a column in a spreadsheet.

Here is the code I have. All it does is open the pdf, use control-a, then

相关标签:
6条回答
  • 2021-01-14 20:15

    I had similar problem. The best solution is, as it was mentioned before, to use Adobe API. In my case it was impossible because macro was intended for 100+ users without Adobe Pro on their PC.

    Ultimate solution that I have developed recently was to build converted in C# (for free using Visual Studio and iText library), install it on end users computers and run whenever I need via VBA. Here are some links for more guidance:

    1. How to develop pdf converter in C#: link
    2. How to create Excel Addin in C#: link
    3. How to run C# addin from VBA: link

    Overall it's fairly complicated but once done works like a dream.

    Another solution as mentioned before is to use sendkeys in VBA. My experience is that it requires some optimization to handle various opening and copying times (depending on file size). Below is code that worked for me, however it's not even near that fast and stable as C# converter.

    Private Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As LongPtr) 'Initialize sleep function via Windows API
    Public Sub CopyToSheet(destinationSheet As Worksheet, pathToPdf as String)
    'Copy data from PDF to worksheet
    
        'Initialize timer
        Dim StartTime As Double
        StartTime = Timer
    
        'Clear clipboard
        Dim myData As DataObject
        Set myData = New DataObject
        myData.SetText text:=Empty
        myData.PutInClipboard
        Set myData = Nothing
    
        'Build file paths
        Dim pathToAdobe As String
        pathToAdobe = """C:\Program Files (x86)\Adobe\Acrobat Reader DC\Reader\AcroRd32.exe"""
        pathToPdf = """" & pathToPdf & """"
    
        'Open PDF and wait untill it is open. If file is already opened it will be just activated
        Dim pdfId As Long
        pdfId = Shell(pathToAdobe & " " & pathToPdf, vbMaximizedFocus)
        Do
            Sleep (500)
            If Round(Timer - StartTime, 2) > 9 Then MsgBox "Failed to open PDF": Exit Sub  'Safety check
        Loop Until Me.IsPdfOpen(pathToPdf)
    
        'Copy and wait until copying is completed
        SendKeys "^a"
        SendKeys "^c"
        Do
            Sleep (500)
            If Round(Timer - StartTime, 2) > 18 Then MsgBox "Failed to copy data to clipboard": Exit Sub  'Safety check
        Loop Until Me.GetClipboardStatus = "ClipboardHasData"
    
        'Paste data into worksheet
        destinationSheet.Activate
        destinationSheet.Range("A1").Select
        destinationSheet.Paste
    
        'Close pdf
        Call Shell("TaskKill /F /PID " & CStr(pdfId), vbHide)
    
        'Clear clipboard
        Set myData = New DataObject
        myData.SetText text:=Empty
        myData.PutInClipboard
        Set myData = Nothing
    
    End Sub
    
    Function IsPdfOpen(pathToPdf) As Boolean
    'Check if PDF is already opened
    
        'Build window name (window name is name of the application on Windows task bar)
        Dim windowName As String
        windowName = pathToPdf
        windowName = Mid(windowName, InStrRev(windowName, "\") + 1, Len(windowName) - InStrRev(windowName, "\") + 1)
        windowName = windowName + " - Adobe Acrobat Reader DC"
    
        'Try to activate application to check if is opened
        On Error Resume Next
        AppActivate windowName, True
        Select Case Err.Number
            Case 5: IsPdfOpen = False
            Case 0: IsPdfOpen = True
            Case Else: Debug.Assert False
        End Select
        On Error GoTo 0
    
    End Function
    
    Function GetClipboardStatus() As String
    'Check if copying data to clipboard is completed
    
        Dim tempString As String
        Dim myData As DataObject
    
        'Try to put data from clipboard to string to check if operations on clipboard are completed
        On Error Resume Next
        Set myData = New DataObject
        myData.GetFromClipboard
        tempString = myData.GetText(1)
        If Err.Number = 0 Then
            If tempString = "" Then
                GetClipboardStatus = "ClipboardEmpty"
            Else
                GetClipboardStatus = "ClipboardHasData"
            End If
        Else
            GetClipboardStatus = "ClipboardBusy"
        End If
        On Error GoTo 0
    
        Set myData = Nothing
    
    End Function
    
    0 讨论(0)
  • 2021-01-14 20:19

    Jeanno's right, if you have Acrobat then using its API library to work with the file directly is much better than the workarounds. I use this every day to convert pdf files into database entries.

    Your code has a few problems, but I suspect the biggest issue is the use of SendKeys "^v" to paste into Excel. You're better off selecting the cell you want then using Selection.Paste. Or even better, transfer the contents of the clipboard to a variable, then parse it out as needed on the backend before writing to your spreadsheet--but that adds a bunch of complexity and doesn't help you a lot in this case.

    To use the code below, be sure to select your 'Acrobat x.x Type Library' under Tools>References.

    Sub StartAdobe1()
        Dim fName       As Variant
        Dim wbTransfer  As Excel.Workbook
        Dim wsNew       As Excel.Worksheet
        Dim dOpenCol    As Double
        Dim oPDFApp     As AcroApp
        Dim oAVDoc      As AcroAVDoc
        Dim oPDDoc      As AcroPDDoc
    
        'Define your spreadsheet
        Set wbTransfer = Workbooks("transfer (Autosaved).xlsm")
        Set wsNew = wbTransfer.Sheets("new")
        'Find first open column
        dOpenCol = ws.Cells(1, columns.count).End(xlToleft).Column + 1
    
        'Instantiate Acrobat Objects
        Set oPDFApp = CreateObject("AcroExch.App")
        Set oAVDoc = CreateObject("AcroExch.AVDoc")
        Set oPDDoc = CreateObject("AcroExch.PDDoc")
    
    For Each fName In Range("path")
    
        'Open the PDF file. The AcroAVDoc.Open function returns a true/false 
        'to tell you if it worked
        If oAVDoc.Open(fName.Text, "") = True Then
            Set oPDDoc = oAVDoc.GetPDDoc
        Else
            Debug.Assert False
        End If
    
        'Copy all using Acrobat menu
        oPDFApp.MenuItemExecute ("SelectAll")
        oPDFApp.MenuItemExecute ("Copy")
    
        'Paste into open column
        wbTransfer.Activate
        wsNew.Cells(1, dOpenCol).Select
        ActiveSheet.Paste
    
        'Select next open column
        dOpenCol = dOpenCol + 1
    
        oAVDoc.Close (1)    '(1)=Do not save changes
        oPDDoc.Close
    
    Next
    
        'Clean up
        Set wbTransfer = Nothing
        Set wsNew = Nothing
        Set oPDFApp = Nothing
        Set oAVDoc = Nothing
        Set oPDDoc = Nothing
    
    
    End Sub
    

    Note: 1-There is also a menu item oPDFApp.MenuItemExecute ("CopyFileToClipboard") that should do the select all and copy in one step, but I have had problems with it so I stick to the two-step method above.

    2-A pdf file consists of two objects, the oAVDoc and the oPDDoc. Different aspects of the file are controlled by each. In this case you might only need the oAVDoc. Try commenting out the lines dealing with oPDDoc and see if it works without them.

    0 讨论(0)
  • 2021-01-14 20:20

    try this code this might work:

     Sub Shell_Copy_Paste()
    
       Dim o As Variant
       Dim wkSheet As Worksheet
    
       Set wkSheet = ActiveSheet
    
       o = Shell("C:\Program Files (x86)\Adobe\Acrobat Reader DC\Reader\AcroRd32.exe C:\Users\saurabh.ad.sharma\Desktop\red.pdf", vbNormalFocus)
    
       Application.Wait (Now + TimeSerial(0, 0, 2)) 'Wait for Acrobat to load
    
       SendKeys "^a"   'Select All
       SendKeys "^c"   'Copy
       SendKeys "%{F4}"    'Close shell application
    
       wkSheet.Range("B5").Select
       SendKeys "^v"   'Paste
    
    End Sub
    
    0 讨论(0)
  • 2021-01-14 20:22

    I can't quite get your code to work, but my guess is that it's copying all of the data, but overwriting it each time through the loop. To fix this try:

    ActiveSheet.Cells(1, ActiveSheet.Cells(1, Columns.Count).End(xlToLeft).Column + 1).Select
    

    instead of the two lines that begin activesheet.range("A1").Select and Selection.End....

    0 讨论(0)
  • 2021-01-14 20:27

    This is the more modified version of my above code it will not save any document it will save data in clipboard and will do the execution fast..

    Private Sub CommandButton3_Click()  '(load pdf)
    
    
       Dim o As Variant
    Set appWord = CreateObject("Word.Application")
     o = Shell("C:\Program Files (x86)\Adobe\Acrobat Reader DC\Reader\AcroRd32.exe C:\Users\saurabh.ad.sharma\Desktop\Book1.pdf2", vbNormalFocus)
     Application.Wait (Now + TimeSerial(0, 0, 2))
       SendKeys ("^a")
    SendKeys ("^c")
     SendKeys "%{F4}"
    Application.Wait Now + TimeValue("00:00:01")
     Set appWord = CreateObject("Word.Application")
     appWord.Visible = False
     appWord.Documents.Add.Content.Paste
    With appWord
    
           .Selection.WholeStory
           .Selection.Copy
           .ActiveDocument.Close SaveChanges:=wdDoNotSaveChanges
           .Quit
    End With
    
    MsgBox " pdf is loaded "
    MsgBox " Paste to EXCEL "
    
    
       Set wkSheet = ActiveSheet
        wkSheet.Range("A1").Select
        wkSheet.Paste
    
    End Sub
    
    0 讨论(0)
  • 2021-01-14 20:34

    BELOW CODE WILL COPY DATA FROM PDF & will PASTE IT IN WORD THEN COPY DATA FROM WORD AND THEN PASTE IT TO THE EXCEL .

    NOW Why I am copying data from pdf to word & then copying from word and pasting it to the excel because i want the data from the pdf in exact format to my excel sheet if i copy directly from pdf to excel it will paste the whole data from pdf into a single cell means even if i am having two columns or multiple rows it will paste all of my data into one column and that too in single cell but if i copy from word to excel it will retain its original format and two columns will get pasted as two columns only in excel.

    Private Sub CommandButton3_Click()  '(load pdf)
    
    
       Dim o As Variant
    Set appWord = CreateObject("Word.Application")
     o = Shell("C:\Program Files (x86)\Adobe\Acrobat Reader DC\Reader\AcroRd32.exe C:\Users\saurabh.ad.sharma\Desktop\Book1.pdf", vbNormalFocus)   'loading adobe reader & pdf file from their location
     Application.Wait (Now + TimeSerial(0, 0, 2))
       SendKeys ("^a")
    SendKeys ("^c")
     SendKeys "%{F4}"
    Application.Wait Now + TimeValue("00:00:01")
     Set appWord = CreateObject("Word.Application")
     appWord.Visible = True
     appWord.Documents.Add.Content.Paste
    With appWord
    
           .ActiveDocument.SaveAs Filename:=ThisWorkbook.Path & "\pdf" & ".docx", FileFormat:=wdocument   'saving word file in docx format
            .ActiveWindow.Close
            .Quit
        End With
    
    MsgBox " pdf is loaded "
    MsgBox " Paste to EXCEL "
    
        Set appWord = CreateObject("Word.Application")
         appWord.Visible = True
    
      appWord.Documents.Open "C:\Users\saurabh.ad.sharma\Desktop\pdf.docx" 'opening word document
          appWord.Selection.WholeStory
          appWord.Selection.Copy
       Set wkSheet = ActiveSheet
        wkSheet.Range("A1").Select
        wkSheet.Paste 'pasting to the excel file
    
    End Sub
    
    0 讨论(0)
提交回复
热议问题