问题
Below is my code in powershell to fetch the links in a webpage. Intermittently, I get "Cannot index into null array" exception. Is there anything wrong in this code. Help required.
$Download = $wc.DownloadString($Link)
$List = $Download -split "<a\s+" | %{ [void]($_ -match "^href=[`'`"]([^`'`">\s]*)"); $matches[1] }
回答1:
You don't need to parse anything yourself (and as was pointed out in the comments, you can't parse HTML with a regex in the first place). Use Invoke-Webrequest
to fetch the page; one of the properties of the object it returns is a collection of all the links on the page, already parsed out for you.
Example:
$Link = "https://stackoverflow.com/questions/49418802/getting-links-from-webpage-in-powershell-using-regular-expression";
Invoke-WebRequest -Uri $Link | Select-Object -ExpandProperty links;
Or, if you need just the URLs, you can do it a bit more concisely:
$Link = "https://stackoverflow.com/questions/49418802/getting-links-from-webpage-in-powershell-using-regular-expression";
(Invoke-WebRequest -Uri $Link).links.href;
来源:https://stackoverflow.com/questions/49418802/getting-links-from-webpage-in-powershell-using-regular-expression