Slow SelectSingleNode | 易学教程

问题

I have a simple structured XML file like this:

<ttest ID="ttest00001", NickName="map00001"/>
<ttest ID="ttest00002", NickName="map00002"/>
<ttest ID="ttest00003", NickName="map00003"/>
<ttest ID="ttest00004", NickName="map00004"/>

..... This xml file can be around 2.5MB.

In my source code I will have a loop to get nicknames

In each loop, I have something like this:

nickNameLoopNum = MyXmlDoc.SelectSingleNode("//ttest[@ID=' + testloopNum + "']").Attributes["NickName"].Value

This single line will cost me 30 to 40 millisecond.

I searched some old articles (dated back to 2002) saying, use some sort of compiled "xpath" can help the situation, but that was 5 years ago. I wonder is there a mordern practice to make it faster? (I'm using .NET 3.5)

回答1:

Using the "//" abbreviation in an XPath expression results in big inefficiency as it causes the whole XML document to be searched. Using '//' repeatedly multiplies this inefficiency.

One efficient solution to the problem is to obtain all "NickName" attribute nodes by evaluating just one single XPath expression:

ttest/@NickName

where the context node is the parent of all "ttest" elements.

The C# code will look like the following:

    int n = 15;
    XmlDocument doc = new XmlDocument();
    doc.Load("MyFile.xml");

    XmlNodeList nodeList;
    XmlNode top = doc.DocumentElement;
    nodeList =
        top.SelectNodes("ttest/@NickName");

    // Get the N-th NickName, can be done in a loop for
    // all n in a range

    string nickName = nodeList[n].Value;

Here we suppose that the "ttest" elements are children of the top element of the xml document.

To summarize, an efficient solution is presented, which evaluates an XPath expression only once and places all results in a convenient IEnumerable object (that can be used as an array) to access any required item in O(c) time.

回答2:

You're using XPath already ("//ttest..."), and it's the slowest way to access the doc nodes as the "//" syntax looks across the entire doc.

try something like...

foreach (XMLNode node in MyXmlDoc.ChildNodes) {
    ...
}

instead, no xpath required and it should be quicker. (implicit assumption that it's a 'flat' xml file with no nesting. If so, you'll be recursing soon my lad).

回答3:

In answer to Dimitre

Actually... selecting the whole node is quicker than selecting just the attributes.

I have a unit test benchmarking the code below and (amazingly)selecting full node and processing the attribute is quicker than selecting the attributes and getting the value straight away.

put this in a 10000 iterations loop and swap comments to test each way.

 //XmlNodeList nodeList = document.SelectNodes("test/@NickName");
            XmlNodeList nodeList = document.SelectNodes("test");
            foreach (XmlNode node in nodeList)
            {
                //string nickName = node.Value;
                string nickName = ((XmlAttribute)node.Attributes.GetNamedItem("NickName")).Value;

            }

Counterintuitive I know, but.... you have to measure!!

回答4:

In this case you might want to consider reading the nicknames in the XML file into an array (if your test IDs are really just sequential integers) or a dictionary (if not) up front, then using that to locate each nickname, rather than trying to do a bunch of XPath queries. You would probably get much better performance on lookups that way.

Edit: Something like this (pseudo-code)

var nicknames = new Dictionary<string, string>();

foreach (XmlNode node in MyXmlDoc.ChildNodes)
{
    if (node is XmlElement)
    {
        nicknames.Add(node.Attributes["ID"], node.Attributes["NickName"]);
    }
}

...

nickNameLoopNum = nicknames[testLoopNum];

来源：https://stackoverflow.com/questions/385272/slow-selectsinglenode

标签

.net

xml

performance

.net-3.5