Is there a faster way to parse an excel document with Powershell?

后端 未结 2 1542
一个人的身影
一个人的身影 2021-02-05 18:13

I\'m interfacing with an MS Excel document via Powershell. There is a possibility of each excel document of having around 1000 rows of data.

相关标签:
2条回答
  • 2021-02-05 18:22

    If the data is static (no formulas involved, just data in cells), you can access the spreadsheet as an ODBC data source and execute SQL (or at least SQL-like) queries against it. Have a look at this reference for setting up your connectionstring (each worksheet in a workbook will be a "table" for this exercise), and use System.Data to query it the same as you would a regular database (Don Jones wrote a wrapper function for this which may help).

    This should be faster than launching Excel & picking through cell by cell.

    0 讨论(0)
  • 2021-02-05 18:32

    In his blog entry Speed Up Reading Excel Files in PowerShell, Robert M. Toups, Jr. explains that while loading to PowerShell is fast, actually reading the Excel cells is very slow. On the other hand, PowerShell can read a text file very quickly, so his solution is to load the spreadsheet in PowerShell, use Excel’s native CSV export process to save it as a CSV file, then use PowerShell’s standard Import-Csv cmdlet to process the data blazingly fast. He reports that this has given him up to a 20 times faster import process!

    Leveraging Toups’ code, I created an Import-Excel function that lets you import spreadsheet data very easily. My code adds the capability to select a specific worksheet within an Excel workbook, rather than just using the default worksheet (i.e. the active sheet at the time you saved the file). If you omit the –SheetName parameter, it uses the default worksheet.

    function Import-Excel([string]$FilePath, [string]$SheetName = "")
    {
        $csvFile = Join-Path $env:temp ("{0}.csv" -f (Get-Item -path $FilePath).BaseName)
        if (Test-Path -path $csvFile) { Remove-Item -path $csvFile }
    
        # convert Excel file to CSV file
        $xlCSVType = 6 # SEE: http://msdn.microsoft.com/en-us/library/bb241279.aspx
        $excelObject = New-Object -ComObject Excel.Application  
        $excelObject.Visible = $false 
        $workbookObject = $excelObject.Workbooks.Open($FilePath)
        SetActiveSheet $workbookObject $SheetName | Out-Null
        $workbookObject.SaveAs($csvFile,$xlCSVType) 
        $workbookObject.Saved = $true
        $workbookObject.Close()
    
         # cleanup 
        [System.Runtime.Interopservices.Marshal]::ReleaseComObject($workbookObject) |
            Out-Null
        $excelObject.Quit()
        [System.Runtime.Interopservices.Marshal]::ReleaseComObject($excelObject) |
            Out-Null
        [System.GC]::Collect()
        [System.GC]::WaitForPendingFinalizers()
    
        # now import and return the data 
        Import-Csv -path $csvFile
    }
    

    These supplemental functions are used by Import-Excel:

    function FindSheet([Object]$workbook, [string]$name)
    {
        $sheetNumber = 0
        for ($i=1; $i -le $workbook.Sheets.Count; $i++) {
            if ($name -eq $workbook.Sheets.Item($i).Name) { $sheetNumber = $i; break }
        }
        return $sheetNumber
    }
    
    function SetActiveSheet([Object]$workbook, [string]$name)
    {
        if (!$name) { return }
        $sheetNumber = FindSheet $workbook $name
        if ($sheetNumber -gt 0) { $workbook.Worksheets.Item($sheetNumber).Activate() }
        return ($sheetNumber -gt 0)
    }
    
    0 讨论(0)
提交回复
热议问题