I\'ve written some VBA code using file objects to go into a folder, search for particular files (CSV) that meet certain criteria (contain \"HR\" in filename and created with
This ought to show some improvement, considering the ratio of HR files to total files (250 / 30,000).
Dir
Function, Minimize reliance on FileSystemObject
The idea here is to use the Dir
function first to get a list of all file names that contain the "HR" substring, and only use the FileSystemObject
against those files to get the timestamp information -- there's no use incurring the overhead of FSO on every file in that directory.
Then, we process only those files which match the "HR" criteria:
Sub usingDir()
Dim folderPath As String
Dim fileName As String
Dim filesToProcess As New Collection
Dim item As Variant
Dim fileDate As Date
Dim firstDate As Date
Dim secondDate As Date
'Defining the user-input variables
firstDate = Cells(2, "E").Value
secondDate = Cells(3, "E").Value
folderPath = "\\SRV-1\process\DUMP\"
' Gets a collection of files matching the "HR" criteria
fileName = Dir(folderPath)
Do While Not fileName = ""
If InStr(fileName, "HR") > 0 Then
'Only processing files with "HR"
filesToProcess.Add (folderPath & fileName)
End If
fileName = Dir
Loop
'Now we deal only with the "HR" files:
With CreateObject("Scripting.FileSystemObject")
For Each item In filesToProcess
' Check the date last modified
fileDate = .GetFile(item).DateLastModified ' modify as needed
If firstDate < fileDate And secondDate > fileDate Then
'
'
Debug.Print item
'your code to Do Stuff goes here
'
'
'
End If
Next
End With
End Sub
FileSystemObject
This was nagging at me, and I figured there must be a way to get the timestamp information without relying on FileSystemObject
. There is. We'll still use Dir
to traverse the files, but now we'll eliminate any reference to FileSystemObject
and replace with some fancy WinAPI function calls. Check out Chip Pearson's article here and download the .bas modules. You'll need the following two files imported to your VBProject:
And then you can do something like this:
Option Explicit
Sub withoutFSO()
Dim folderPath As String
Dim FileName As String
Dim filesToProcess As New Collection
Dim item As Variant
Dim fileDate As Date
Dim firstDate As Date
Dim secondDate As Date
'Defining the user-input variables
firstDate = Cells(2, "E").Value
secondDate = Cells(3, "E").Value
folderPath = "\\Your\Path"
' Gets a collection of files matching the "HR" criteria and our Date range
FileName = Dir(folderPath)
Do While Not FileName = ""
'Only processing files with "HR"
If InStr(FileName, "HR") > 0 Then
' Only process files that meet our date criteria
fileDate = CDate(modGetSetFileTimes.GetFileDateTime(CStr(item), FileDateLastModified))
If firstDate < fileDate And secondDate > fileDate Then
filesToProcess.Add (folderPath & FileName)
End If
End If
FileName = Dir
Loop
'Now we deal only with the matching files:
For Each item In filesToProcess
Debug.Print item
Debug.Print fileDate
'your code to Do Stuff goes here
'
'
'
Next
End Sub
This should be an improvement even over my original answer, and, if combined with a more efficient manner of retrieving data (i.e., using ADO instead of Workbooks.Open
, if possible) then you should be very optimized.
Take a look at Power Query -- it's a Microsoft add-in for Excel versions 2012 & 2013, and built-in to 2016. Setting up PQ to do this will be amazingly fast, and the 'script' is reusable! No VBA needed.
You can search and combine the multiple files on the specified criteria, but then merge or append to the new/master file, too. For efficiency, rather than processing each file individually, might I suggest gathering up all the data files (by your criteria), combining them to one table, then use the new table to merge/append to the new/master
Hope this helps...
It took a long time because for each interation you pass the information to the main worksheet.
In this case is better use a multidimensional array to keep the information and in the end of the process you pass the the array info in the main worksheet.
I dont know what information you get in each worksheet, soo i cant create an didatical example for you.
In addition to using the Dir
function instead of FileSystemObject
, if you cannot automate PowerQuery, and all you need is the data and not the formatting, consider making a direct data connection to the source workbooks using ADODB.
Add a reference to Microsoft ActiveX Data Objects 6.1 Library (via Tools -> References...). There may be versions other than 6.1; choose the highest.
Then you can use something like the following code:
Dim fso As New Scripting.FileSystemObject
Dim filepath As Variant
For Each filepath In filesToProcess
' Check the date last modified
fileDate = fso.GetFile(item).DateLastModified ' modify as needed
If firstDate < fileDate And secondDate > fileDate Then
Dim connectionString As String
connectionString = _
"Provider=Microsoft.ACE.OLEDB.12.0;" & _
"Data Source=""" & filepath & """;" & _
"Extended Properties=""Excel 12.0;HDR=No"""
Dim worksheetName As String
worksheetName = "Sheet1"
' There can be multiple worksheets per workbook.
' If you are only interested in one worksheet per workbook, then fill in worksheetName somehow
' Otherwise, you will probably need an inner loop to iterate over all the worksheets
Dim sql As String
sql = _
"SELECT * " & _
"FROM [" & worksheetName & "$]"
Dim rs As New ADODB.Recordset
rs.Open sql, connectionString
destinationWorksheet.Range("A1").CopyFromRecordset rs
rs.Close
Set rs = Nothing
End If
Next