问题
I have the following code:
$html = New-Object -ComObject "HTMLFile"
$source = Get-Content -Path $FilePath -Raw
try
{
$html.IHTMLDocument2_write($source) 2> $null
}
catch
{
$encoded = [Text.Encoding]::Unicode.GetBytes($source)
$html.write($encoded)
}
$t = $html.getElementsByTagName("table") | Where-Object {
$cells = $_.tBodies[0].rows[0].cells
$cells[0].innerText -eq "Name" -and
$cells[1].innerText -eq "Description" -and
$cells[2].innerText -eq "Default Value" -and
$cells[3].innerText -eq "Release"
}
The code works fine on Windows Powershell 5.1, but on Powershell Core 7 $_.tBodies[0].rows
returns null.
So, how does one access the rows of an HTML table in PS 7?
回答1:
PowerShell [Core], as of 7.0, doesn't come with a built-in HTML parser.
You must rely on a third-party solution, such as the PowerHTML module that wraps the HTML Agility Pack.
The object model works differently than the Internet Explorer-based one available in Windows PowerShell; it is similar to the XML DOM provided by the standard System.Xml.XmlDocument type[1]; see the documentation and the sample code below.
# Install the module on demand
If (-not (Get-Module -ErrorAction Ignore -ListAvailable PowerHTML)) {
Write-Verbose "Installing PowerHTML module for the current user..."
Install-Module PowerHTML -ErrorAction Stop
}
Import-Module -ErrorAction Stop PowerHTML
# Create a sample HTML file with a table with 2 columns.
Get-Item $HOME | Select-Object Name, Mode | ConvertTo-Html > sample.html
# Parse the HTML file into an HTML DOM.
$htmlDom = ConvertFrom-Html -Path sample.html
# Find a specific table by its column names, using an XPath
# query to iterate over all tables.
$table = $htmlDom.SelectNodes('//table') | Where-Object {
$headerRow = $_.Element('tr') # or $tbl.Elements('tr')[0]
# Filter by column names
$headerRow.ChildNodes[0].InnerText -eq 'Name' -and
$headerRow.ChildNodes[1].InnerText -eq 'Mode'
}
# Print the table's HTML text.
$table.InnerHtml
# Extract the first data row's first column value.
# Note: @(...) is required around .Elements() for indexing to work.
@($table.Elements('tr'))[1].ChildNodes[0].InnerText
[1] Notably with respect to supporting XPath queries via the .SelectSingleNode()
and .SelectNodes()
methods, exposing child nodes via a .ChildNodes
collection, and providing .InnerHtml
/ .OuterHtml
/ .InnerText
properties. Instead of an indexer that supports child element names, methods .Element(<name>)
and .Elements(<name>)
are provided.
来源:https://stackoverflow.com/questions/60655737/how-to-parse-html-table-with-powershell-core-7