Where's the bug in this tree traversal code?

雨燕双飞 提交于 2019-12-22 08:57:53

问题


There's a bug in Traverse() that's causing it to iterate nodes more than once.

Bugged Code

public IEnumerable<HtmlNode> Traverse()
{
    foreach (var node in _context)
    {
        yield return node;
        foreach (var child in Children().Traverse())
            yield return child;
    }
}

public SharpQuery Children()
{
    return new SharpQuery(_context.SelectMany(n => n.ChildNodes).Where(n => n.NodeType == HtmlNodeType.Element), this);
}

public SharpQuery(IEnumerable<HtmlNode> nodes, SharpQuery previous = null)
{
    if (nodes == null) throw new ArgumentNullException("nodes");
    _previous = previous;
    _context = new List<HtmlNode>(nodes);
}

Test Code

    static void Main(string[] args)
    {
        var sq = new SharpQuery(@"
<a>
    <b>
        <c/>
        <d/>
        <e/>
        <f>
            <g/>
            <h/>
            <i/>
        </f>
    </b>
</a>");
        var nodes = sq.Traverse();
        Console.WriteLine("{0} nodes: {1}", nodes.Count(), string.Join(",", nodes.Select(n => n.Name)));
        Console.ReadLine();

Output

19 nodes: #document,a,b,c,g,h,i,d,g,h,i,e,g,h,i,f,g,h,i

Expected Output

Each letter a-i printed once.

Can't seem to figure out where's it going wrong... node.ChildNodes does return just direct children, right? (from HtmlAgilityPack)


Full class (condensed) if you want to try and run it yourself.

public class SQLite
{
    private readonly List<HtmlNode> _context = new List<HtmlNode>();
    private readonly SQLite _previous = null;

    public SQLite(string html)
    {
        var doc = new HtmlDocument();
        doc.LoadHtml(html);
        _context.Add(doc.DocumentNode);
    }

    public SQLite(IEnumerable<HtmlNode> nodes, SQLite previous = null)
    {
        if (nodes == null) throw new ArgumentNullException("nodes");
        _previous = previous;
        _context = new List<HtmlNode>(nodes);
    }

    public IEnumerable<HtmlNode> Traverse()
    {
        foreach (var node in _context)
        {
            yield return node;
            foreach (var child in Children().Traverse())
                yield return child;
        }
    }

    public SQLite Children()
    {
        return new SQLite(_context.SelectMany(n => n.ChildNodes).Where(n => n.NodeType == HtmlNodeType.Element), this);
    }
}

回答1:


First, note that everything goes according to plan as long as we're iterating over elements that don't have any sibling. As soon as we hit element <c>, things start going haywire. It's also interesting that <c>, <d> and <e> all think they contain <f>'s children.

Let's take a closer look at your call to SelectMany():

_context.SelectMany(n => n.ChildNodes)

That iterates through the items in _context and accumulates the child nodes of each item. Let's have a look at _context. Everything should be okay, since it's a List of length 1. Or is it?

I suspect your SharpQuery(string) constructor stores sibling elements inside the same list. In that case, _context may not be of length 1 anymore and, remember, SelectMany() accumulates the child nodes of each item in the list.

For example, if _context is a list containing <c>, <d>, <e> and <f>, only <f> has children, and SelectMany() is called once for each element, it will accumulate and return the children of <f> four times.

I think that's your bug.

EDIT: Since you've posted the full class, I don't have to guess anymore. Look at the sequence of operations when you iterate over <b> (iterators replaced by lists for better comprehension):

  1. Call Children() on <b>, which returns <c>, <d>, <e> and <f>,
  2. Call Traverse() on the resulting list:
    1. Call Children() on <c>, but _context actually contains <c>, <d>, <e> and <f>, not only <c>, so that returns <g>, <h> and <i>,
    2. Call Children() on <d>, same thing since _context is the same for both <c> and <d> (and <e>, and <f>),
    3. Lather, rinse, repeat.



回答2:


Children() is broken, it selects children of siblings as well. I'd rewrite to something like this:

public IEnumerable<HtmlNode> Traverse(IEnumerable<HtmlNode> nodes)
{
    foreach (var node in nodes)
    {
        yield return node;
        foreach (var child in Traverse(ChildNodes(node)))
            yield return child;
    }
}

private IEnumerable<HtmlNode> ChildNodes(HtmlNode node)
{
    return node.ChildNodes.Where(n => n.NodeType == HtmlNodeType.Element);
}


public SharpQuery(IEnumerable<HtmlNode> nodes, SharpQuery previous = null)
{
    if (nodes == null) throw new ArgumentNullException("nodes");
    _previous = previous; // Is this necessary?
    _context = new List<HtmlNode>(nodes);
}



回答3:


Shouldn't this be enough?

public IEnumerable<HtmlNode> Traverse()
{
    foreach (var node in _context)
    {
        Console.WriteLine(node.Name);
        yield return node;
        foreach (var child in Children().Traverse())
            {} //yield return child;
    }
}



回答4:


I can't tell exactly, but you can see the pattern is that once your stuff encounter the "/" of the "c", it start to consider the "g,h,i" to be part of every subsequent letter until it encounter the "/" of the "f".

Most likely you have an extra loop that you should get ride of.

Or for some reason, it doesnt compute correctly the "/>" parts.




回答5:


I would do something like this and see if it fixes the issue:

public IEnumerable<HtmlNode> Traverse()
{
foreach (var node in _context)
{
    Console.WriteLine(node.Name);
    if(!node.HasChildren) {
     yield return node;
    }
    else {
    foreach (var child in Children().Traverse())
        yield return child;
    }
    }
}
}



回答6:


public IEnumerable<HtmlNode> All()
{
    var queue = new Queue<HtmlNode>(_context);

    while (queue.Count > 0)
    {
        var node = queue.Dequeue();
        yield return node;
        foreach (var next in node.ChildNodes.Where(n => n.NodeType == HtmlNodeType.Element))
            queue.Enqueue(next);
    }
}

courtesy link



来源:https://stackoverflow.com/questions/4138803/wheres-the-bug-in-this-tree-traversal-code

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!