Using multiple find in foreach with QueryPath

南笙酒味 提交于 2019-12-06 08:17:13

问题


I'm using QueryPath and PHP.

This finds the .eventdate okay, but doesn't return anything for .dtstart:

$qp = htmlqp($url);
foreach ($qp->find('table#schedule')->find('tr') as $tr){
    echo 'date: ';
    echo $tr->find('.eventdate')->text();
    echo ' time: ';
    echo $tr->find('.dtstart')->text();
    echo '<br>';
}

If I swap the two, .dtstart works okay, but .eventdate doesn't return anything. Thus, it seems that find() in querypath destroys the element and only returns the value it needs, making iteration over $tr not possible to search for multiple items.

Here's example HTML for a TR I'm dealing with:

<tr class="event"><th class="date first" scope="row"><abbr class="eventdate" title="Thursday, February 01, 2011" >02/01</abbr><span class="eventtime" ><abbr class="dtstart" title="2012-02-01T19:00:00" >7:00 PM</abbr><abbr class="dtend" title="2012-02-01T21:00:00" >9:00 PM</abbr></span></th><td class="opponent summary"><ul><li class="first">@ <a class="team" href="/high-schools/ridge-wolves/basketball-winter-11-12/schedule.htm" >Ridge </a> <span class="game-note">*</span></li><li class="location" title="Details: Ridge High School">Details: Ridge High School</li><li class="last"><a class="" href="/local/stats/pregame.aspx?contestid=4255-4c6c-906d&amp;ssid=381d-49f5-9f6d" >Preview Game</a></li></ul></td><td class="result last"><a class="pregame" href="/local/stats/pregame.aspx?contestid=4255-4c6c-906d&amp;ssid=381d-49f5-9f6d">Preview</a></td></tr>

I tried copying the $tr before the first find and replacing it before the second, but that didn't work.

How can I search during each $tr for certain variables?

FYI, beyond .eventdate and .dtstart, I also want the .opponent, href under the a for the opponent and the a anchor text.


回答1:


I'm just learning QueryPath myself, but I think you should branch the row object. Otherwise the $tr->find('.eventdate') will take you to the abbr element contained in the row, and each following find() will try to find elements beneath the abbr, resulting in no matches. branch() (see documentation) creates a copy of the QueryPath object, leaving the original object (in this case $tr) intact.

So your code would be:

$qp = htmlqp($url);
foreach ($qp->find('table#schedule')->find('tr') as $tr){
    echo 'date: ';
    echo $tr->branch()->find('.eventdate')->text();
    echo ' time: ';
    echo $tr->branch()->find('.dtstart')->text();
    echo '<br>';
}

I don't know if this is the preferred way to achieve what you want, but it seems to work.




回答2:


QueryPath maintains its state internally (unlike jQuery) for performance reasons. So branch() is the way to go.

As a modification to the proposed solution, though, I would suggest minimizing the number of find() calls by doing this:

$qp = htmlqp($url);
foreach ($qp->find('table#schedule tr') as $tr){
    echo 'date: ';
    echo $tr->branch('.eventdate')->text();
    echo ' time: ';
    echo $tr->branch('.dtstart')->text();
    echo '<br>';
}

Finally, any time you do a "destructive" action (like a find()), you can always go back one step using end(). So the above could also be done like this:

$qp = htmlqp($url);
foreach ($qp->find('table#schedule tr') as $tr){
    echo 'date: ';
    echo $tr->find('.eventdate')->text();
    echo ' time: ';
    echo $tr->end()->find('.dtstart')->text();
    echo '<br>';
}

This is a VERY VERY minor performance improvement, but I prefer the branch() method unless I'm working with documents larger than 1M.

In QueryPath 3.x, which has a whole bunch of new performance enhancements, I am toying with the idea of going with the jQuery way of creating a new object for each function. Unfortunately, this method will use a LOT more memory, so I may not keep it. While branch() takes a little while to learn, it does have its advantages.




回答3:


yeah you are right, I actually had this problem today, in jquery, you just query, query, query, query no problems, however QueryPath if you query, it changes the internal "state" of the object so if you attempt a second query, it's applied against the current state.

so if you want to query multiple "separate" locations in the document, you have to branch before

$q = qp("something.html);
$a = $q->branch()->find("tr");
$b = $q->branch()->find("a");

that seems to work in my code, so I suppose it will work in yours.



来源:https://stackoverflow.com/questions/8394657/using-multiple-find-in-foreach-with-querypath

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!