问题
I need to retrieve the number of pages in PDF files (with security), using Excel VBA.
The following code works when there is no security enabled in the PDF file:
Sub PDFandNumPages()
Dim Folder As Object
Dim file As Object
Dim fso As Object
Dim iExtLen As Integer, iRow As Integer
Dim sFolder As String, sExt As String
Dim sPDFName As String
sExt = "pdf"
iExtLen = Len(sExt)
iRow = 1
' Must have a '\' at the end of path
sFolder = "C:\test\"
Set fso = CreateObject("Scripting.FileSystemObject")
If sFolder <> "" Then
Set Folder = fso.GetFolder(sFolder)
For Each file In Folder.Files
If Right(file, iExtLen) = sExt Then
Cells(iRow, 1).Value = file.Name
Cells(iRow, 2).Value = pageCount(sFolder & file.Name)
iRow = iRow + 1
End If
Next file
End If
End Sub
However, if there is any kind of security enabled, then the code is unable to extract the page numbers & returns Zero pages.
Note: There is no Password protection to open these PDF files, it only has some security features enabled to prevent modification of the PDF.
Sample PDF with security enabled are available on following Google Drive link: Google Drive PDF with security
My requirement is to tweak the code so that the page numbers in PDF files are displayed whether there is any security or not.
For Python, I found a similar question & solution at this page, however it uses Python libraries. If possible, I'd like an expert on VBA side to suggest how I can replicate this in VBA
回答1:
If the PDF document doesn't have a Permissions Password setup (or if you know the password), you can modify the document restrictions such as page extraction.
- Open the document manually with a proprietary or 3rd-party editor
- Go
File
→Properties
- In the
Security
tab, chooseShow Details…
- To make changes to the PDF’s restrictions, go View → Tools → Protection
- In the Tools Pane, click
Encrypt
and in theProtection
section, chooseRemove Security
. - If there is a Permissions Password, you will need to enter it here.
The permissions will now change to "Allowed".
(Source)
The "Hacky" Method:
If the above method doesn't work for you, there's a workaround that may do the trick. You won't be unlocking the file itself per se, but you can generate an unlocked equivalent that can be edited and manipulated to your heart's content.
- Open the document that you wish to unlock in Adobe Acrobat Reader
- Click
File
and thenPrint
. - In the Printers list, select
"Microsoft XPS Document Writer"
and then click Print.
If you try to use Adobe's PDF printer driver, it will detect that you are attempting to export a secured PDF to a fresh file and it will refuse to continue. Even third-party PDF print drivers tend to choke on such files.
However, by using the XPS Document Writer, you effectively circumvent that check entirely, leaving yourself with an XPS
output.
- Open the new
XPS
file you have just created and simply repeat the printing process, only this time printing to PDF format.
If you do not have a PDF printer to select in your list of printers, there are various freeware options available online (such as CutePDF Writer) which will allow you to set up a virtual printer that generates PDFs. (Source)
Edit: (Alternate Answer)
Returning the Page Count of a PDF File
To find the total number of pages in a PDF file in VBA, you could open it as a binary file and then parse the file looking for "/Count
", and then reading the number that follows.
Below is an example that works on your sample files (6 & 8 pages), but may need "tweaking" depending on the structure of the individual PDF files on hand.
(In some cases, you may be better off to count the individual occurrences of the "/Page
" or "/Pages
" tags, although that number may need to be reduced by 1 or 2.)
Note that this is not a very efficient way of parsing binaries, so large files could take a while to parse.
Sub Get_PDF_Page_Count()
'scrape PDF file as binary, looking for "/Count" tag, then return the number following it
Const fName = "C:\your_path_here\1121-151134311859-64.pdf"
Dim bytTemp As Byte, fileStr As String, c As Long, p1 As Long, p2 As Long
'open PDF as binary file
Debug.Print "Reading File '" & fName & "'";
Open fName For Binary Access Read As #1
'read file into string
Do While Not EOF(1)
'parse PDF file, one byte at a time
Get #1, , bytTemp
c = c + 1
fileStr = fileStr & Chr(bytTemp)
'check every 20000 characters, if the tag was found yet
If c / 20000 = c \ 20000 Then
If InStr(fileStr, "/Count") > 0 Then Exit Do
' not found yet, keep going
Debug.Print ".";
DoEvents
End If
Loop
'close file
Close #1
Debug.Print
'check if tag was found
If InStr(fileStr, "/Count") = 0 Then
Debug.Print "'/Count' tag not found in file: " & fName
Exit Sub
End If
'return page count
p1 = InStr(fileStr, "/Count")
p1 = InStr(p1, fileStr, " ") + 1
p2 = InStr(p1, fileStr, vbLf)
Beep
Debug.Print Val(Mid(fileStr, p1, p2 - p1)) & " pages in file: " & fName
End Sub
来源:https://stackoverflow.com/questions/48484855/excel-vba-to-return-page-count-from-protected-pdf-file