Fellas!
I have one nasty page to parse but can\'t figure out how to extract correct data blocks from it using Simple HTML DOM, because it has no CSS child selector suppo
I had the same issue, and used the children method to grab just the first level items.
<ul class="my-list">
<li>
<a href="#">Some Text</a>
<ul>
<li><a href="#">Some Inner Text</a></li>
<li><a href="#">Some Inner Text</a></li>
<li><a href="#">Some Inner Text</a></li>
<li><a href="#">Some Inner Text</a></li>
</ul>
</li>
<li>
<a href="#">Some Text</a>
<ul>
<li><a href="#">Some Inner Text</a></li>
<li><a href="#">Some Inner Text</a></li>
<li><a href="#">Some Inner Text</a></li>
<li><a href="#">Some Inner Text</a></li>
</ul>
</li>
</ul>
And here's the Simple HTML Dom code to get just the first level li items:
$html = file_get_html( $url );
$first_level_items = $html->find( '.my-list', 0)->children();
foreach ( $first_level_items as $item ) {
... do stuff ...
}
Simple example with php DOM:
$dom = new DomDocument;
$dom->loadHtml('
<ul class="ul-block">
<li>a</li>
<li>b</li>
<li>
<ul>
<li>c</li>
</ul>
</li>
</ul>
');
$xpath = new DomXpath($dom);
foreach ($xpath->query('//ul[@class="ul-block"]/li') as $liNode) {
echo $liNode->nodeValue, '<br />';
}