问题
I faced that simply iterating through MSHTML elements using C# is horribly slow. Here is small example of iteration through document.all collection three times. We have blank WPF application and WebBrowser control named Browser:
public partial class MainWindow
{
public MainWindow()
{
InitializeComponent();
Browser.LoadCompleted += DocumentLoaded;
Browser.Navigate("http://google.com");
}
private IHTMLElementCollection _items;
private void DocumentLoaded(object sender, NavigationEventArgs e)
{
var dc = (HTMLDocument)Browser.Document;
_items = dc.all;
Test();
Test();
Test();
}
private void Test()
{
var sw = new Stopwatch();
sw.Start();
int i;
for (i = 0; i < _items.length; i++)
{
_items.item(i);
}
sw.Stop();
Debug.WriteLine("Items: {0}, Time: {1}", i, sw.Elapsed);
}
}
The output is:
Items: 274, Time: 00:00:01.0573245
Items: 274, Time: 00:00:00.0011637
Items: 274, Time: 00:00:00.0006619
The performance difference between 1 and 2 lines is horrible. I tried to rewrite same code with unmanaged C++ and COM and got no performance issues at all, unmanaged code runs 1200 times faster. Unfortunately going unmanaged is not an option because the real project is more complex than simple iterating.
I understand that for the first time runtime creates RCW for each referenced HTML element which is COM object. But can it be THAT slow? 300 items per second with 100% core load of 3,2 GHz CPU.
Performance analysis of the code above:
回答1:
enumerate the all element collection using for each instead of document.all.item(index) (use IHTMLElementCollection::get__newEnum if you switch to C++).
Suggested reading: IE + JavaScript Performance Recommendations - Part 1
回答2:
The source of poor performance is that collection items defined as dynamic objects in the MSHTML interop assembly.
public interface IHTMLElementCollection : IEnumerable
{
...
[DispId(0)]
dynamic item(object name = Type.Missing, object index = Type.Missing);
...
}
If we rewrite that interface so it returns IDispatch objects then the lag will disappear.
public interface IHTMLElementCollection : IEnumerable
{
...
[DispId(0)]
[return: MarshalAs(UnmanagedType.IDispatch)]
object item(object name = Type.Missing, object index = Type.Missing);
...
}
New output:
Items: 246, Time: 00:00:00.0034520
Items: 246, Time: 00:00:00.0029398
Items: 246, Time: 00:00:00.0029968
来源:https://stackoverflow.com/questions/14666302/html-traversal-is-very-slow