HtmlWebResponseObject.ParsedHtml replacement in Powershell Core 6

青春壹個敷衍的年華 提交于 2020-05-31 07:45:39

问题


My goal is to parse an html file retrieved with Invoke-WebRequest. If possible I'd like to avoid any external libraries.

The problem I am facing is, that Invoke-WebRequest returns a BasicHtmlWebResponseObject instead of a HtmlWebResponseObject since Powershell 6. The Basic version misses the ParsedHtml property. Is there a good alternative to parse html in Powershell Core 6?

I've tried to use Select-Xml but my html is not entirely valid (e.g. a missing closing tag), hence this fails to parse the result.

Another alternative I've found is to use New-Object -ComObject "HTMLFile" but from my understanding this relies on Internet Explorer for parsing which I'd like to avoid.

There is a very similar question here but sadly this question had no answer or activity since 8 months.


回答1:


As mentioned in the comments it is not really possible without a library. One very good library you could use it the AngleSharp library for dotnet. It has great html parsing capabilities and dotnet code interacts very friendly with powershell, have a look at this link.

Here is an example from their website:

var config = Configuration.Default.WithDefaultLoader();
var address = "https://en.wikipedia.org/wiki/List_of_The_Big_Bang_Theory_episodes";
var context = BrowsingContext.New(config);
var document = await context.OpenAsync(address);
var cellSelector = "tr.vevent td:nth-child(3)";
var cells = document.QuerySelectorAll(cellSelector);
var titles = cells.Select(m => m.TextContent);


来源:https://stackoverflow.com/questions/58661025/htmlwebresponseobject-parsedhtml-replacement-in-powershell-core-6

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!