I have about 50 or so Excel workbooks that I need to pull data from. I need to take data from specific cells, specific worksheets and compile into one dataset (preferably in
I agree with using that accessing the Excel object is not the quickest and if the workbooks and sheets that you're trying to retrieve data from are all consistent (i.e have the same column names, etc... or at least the column names you're looking for) it would be better to use ODBC. This does have some issues and if you can't get around them or need to actually do something more complex based on the contents then there may be no way around it. If that's the case then I would suggest creating one Excel object and then opening and closing the files as needed to try to increase the efficiency.
First start with this google query and click the first link that comes up, which takes you to an article showing how to iterate through a group of Excel files in a folder.
Sub RunCodeOnAllXLSFiles()
Dim lCount As Long
Dim wbResults As Workbook
Dim wbCodeBook As Workbook
Application.ScreenUpdating = False
Application.DisplayAlerts = False
Application.EnableEvents = False
On Error Resume Next
Set wbCodeBook = ThisWorkbook
With Application.FileSearch
.NewSearch
'Change path to suit
.LookIn = "C:\MyDocuments\TestResults"
.FileType = msoFileTypeExcelWorkbooks
'Optional filter with wildcard
'.Filename = "Book*.xls"
If .Execute > 0 Then 'Workbooks in folder
For lCount = 1 To .FoundFiles.Count 'Loop through all
'Open Workbook x and Set a Workbook variable to it
Set wbResults = Workbooks.Open(Filename:=.FoundFiles(lCount), UpdateLinks:=0)
'DO YOUR CODE HERE
wbResults.Close SaveChanges:=False
Next lCount
End If
End With
On Error GoTo 0
Application.ScreenUpdating = True
Application.DisplayAlerts = True
Application.EnableEvents = True
End Sub
To get the name of the workbook, you'll want to adapt the code at "DO YOUR CODE HERE" to include wbResults.Name
. If it's the filename you want, use wbResults.FullName
, which returns the name of the workbook including its path on disk as a string.
A search for a VBScript variation on the same thing yields a number of results that are useful, including this script:
strPath = "C:\PATH_TO_YOUR_FOLDER"
Set objExcel = CreateObject("Excel.Application")
objExcel.Visible = True
objExcel.DisplayAlerts = False
Set objFso = CreateObject("Scripting.FileSystemObject")
Set objFolder = objFso.GetFolder (strPath)
For Each objFile In objFolder.Files
If objFso.GetExtensionName (objFile.Path) = "xls" Then
Set objWorkbook = objExcel.Workbooks.Open(objFile.Path)
' Include your code to work with the Excel object here
objWorkbook.Close True 'Save changes
End If
Next
objExcel.Quit
I would do it in VBScript or even, VB.NET or Powershell if you feel so inclined.
Using VB.NET, you can access Excel spreadsheets as if they were databases, via the OLEDB provider. The code to select a range of values might look like this :
Try
Dim MyConnection As System.Data.OleDb.OleDbConnection
Dim DtSet As System.Data.DataSet
Dim MyCommand As System.Data.OleDb.OleDbDataAdapter
MyConnection = New System.Data.OleDb.OleDbConnection _
("provider=Microsoft.Jet.OLEDB.4.0;" _
" Data Source='testfile.xls'; " _
"Extended Properties=Excel 8.0;")
MyCommand = New System.Data.OleDb.OleDbDataAdapter _
("select * from [Sheet1$]", MyConnection)
MyCommand.TableMappings.Add("Table", "TestTable")
DtSet = New System.Data.DataSet
MyCommand.Fill(DtSet)
MyConnection.Close()
Catch ex As Exception
MsgBox(ex.ToString)
End Try
Once you get the data you can elaborate on it, then insert the result into another Excel spreadsheet, using the same API.
Getting the list of files is easy in .NET with a call to System.IO.Directory.GetFiles(); just specify the "*.xls" wildcard. Once you have the list, just use a for loop to iterate through it, opening each file in turn, then doing the query on that file, and so on.
If you use VBScript, then the preferred way to get the list of Excel files is to use the Scripting.FileSystemObject, specifically the GetFolder method. It works basically the same way but the syntax is slightly different.
If it's VBScript or VB.NET it will probably run outside of Excel itself. You'd run it by double-clicking or from a batch file or something like that. The advantage to using VB.NET is you could put up a graphical form for interaction - it could show a progress bar, tracking how many files you've gone through, status updates, that kind of thing.
Whenever you are accessing that many Excel files in succession, you can generally get better performance using ADODB rather than Excel's automation object.
It could be done with the following code
Sub LoopThroughFiles()
Dim StrFile As String
StrFile = Dir("V:\XX\XXX\*.xlsx")
Do While Len(StrFile) > 0
Debug.Print StrFile
Set wbResults = Workbooks.Open("V:\XX\XXX\" & StrFile)
'DO YOUR CODE HERE
wbResults.Close SaveChanges:=True
StrFile = Dir
Loop
End Sub