问题
For a couple of days I am working on a WebBrowser based webscraper. After a couple of prototypes working with Threads and DocumentCompleted events, I decided to try and see if I could make a simple, easy to understand Webscraper.
The goal is to create a Webscraper that doesn't involve actual Thread objects. I want it to work in sequential steps (i.e. go to url, perform action, go to other url etc. etc.).
This is what I got so far:
public static class Webscraper
{
private static WebBrowser _wb;
public static string URL;
//WebBrowser objects have to run in Single Thread Appartment for some reason.
[STAThread]
public static void Init_Browser()
{
_wb = new WebBrowser();
}
public static void Navigate_And_Wait(string url)
{
//Navigate to a specific url.
_wb.Navigate(url);
//Wait till the url is loaded.
while (_wb.IsBusy) ;
//Loop until current url == target url. (In case a website loads urls in steps)
while (!_wb.Url.ToString().Contains(url))
{
//Wait till next url is loaded
while (_wb.IsBusy) ;
}
//Place URL
URL = _wb.Url.ToString();
}
}
I am a novice programmer, but I think this is pretty straightforward code. That's why I detest the fact that for some reason the program throws an NullReferenceException at this piece of code:
_wb.Url.ToString().Contains(url)
I just called the _wb.Navigate() method so the NullReference can't be in the _wb object itself. So the only thing that I can imagine is that the _wb.Url object is null. But the while _wb.IsBusy() loop should prevent that.
So what is going on and how can I fix it?
回答1:
Busy waiting (while (_wb.IsBusy) ;
) on UI thread isn't much advisable. If you use the new features async/await of .Net 4.5 you can get a similar effect (i.e. go to url, perform action, go to other url etc. etc.) you want
public static class SOExtensions
{
public static Task NavigateAsync(this WebBrowser wb, string url)
{
TaskCompletionSource<object> tcs = new TaskCompletionSource<object>();
WebBrowserDocumentCompletedEventHandler completedEvent = null;
completedEvent = (sender, e) =>
{
wb.DocumentCompleted -= completedEvent;
tcs.SetResult(null);
};
wb.DocumentCompleted += completedEvent;
wb.ScriptErrorsSuppressed = true;
wb.Navigate(url);
return tcs.Task;
}
}
async void ProcessButtonClick()
{
await webBrowser1.NavigateAsync("http://www.stackoverflow.com");
MessageBox.Show(webBrowser1.DocumentTitle);
await webBrowser1.NavigateAsync("http://www.google.com");
MessageBox.Show(webBrowser1.DocumentTitle);
}
来源:https://stackoverflow.com/questions/16193084/webbrowser-control-throws-seemingly-random-nullreferenceexception