Getting links from webpage in Powershell using regular expression

…衆ロ難τιáo~ 提交于 2019-12-11 15:23:44

问题


Below is my code in powershell to fetch the links in a webpage. Intermittently, I get "Cannot index into null array" exception. Is there anything wrong in this code. Help required.

$Download = $wc.DownloadString($Link) 
$List = $Download -split "<a\s+" | %{ [void]($_ -match "^href=[`'`"]([^`'`">\s]*)"); $matches[1] }

回答1:


You don't need to parse anything yourself (and as was pointed out in the comments, you can't parse HTML with a regex in the first place). Use Invoke-Webrequest to fetch the page; one of the properties of the object it returns is a collection of all the links on the page, already parsed out for you.

Example:

$Link = "https://stackoverflow.com/questions/49418802/getting-links-from-webpage-in-powershell-using-regular-expression";
Invoke-WebRequest -Uri $Link | Select-Object -ExpandProperty links;

Or, if you need just the URLs, you can do it a bit more concisely:

$Link = "https://stackoverflow.com/questions/49418802/getting-links-from-webpage-in-powershell-using-regular-expression";
(Invoke-WebRequest -Uri $Link).links.href;


来源:https://stackoverflow.com/questions/49418802/getting-links-from-webpage-in-powershell-using-regular-expression

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!