I\'m using PHP Simple HTML DOM Parser to scrape some data of a webshop (also running XAMPP 1.7.2 with PHP5.3.0), and I\'m running into problems with Make sure that tbody is really is there. Many browsers will add a tbody to tables in the inspect panel even though they are not present in the response. Make sure your in simple_html_dom.php file comment or remove line #396 There is a bug report for this issue here:
http://sourceforge.net/p/simplehtmldom/bugs/79/ It is still open at the time of this writing. There is an alternative fix if you do not wish to modify the source code, for example in a loop to find You can instead selectively check the parent tag name while iterating all Also note, a slightly unrelated issue that I ran into, that Chrome and FF inspectors will correct tag soup regarding
tbody
is coming from some javascript execution. I was facing the same problem with a span tag. Later I found that, if any html code is getting into the page via jquery/any other javascript execution then in that case simple_html_dom
simply fails. // if ($m[1]==='tbody') continue;
<tr>
's <?php
// The *BROKEN* way to find the <tr>'s
// below the <tbody> below the <table id="foo">
foreach($dom->find('tbl#foo tbody tr') as $tr) {
/* you will get nothing */
}
<tr>
's like so:<?php
// A workaround to find the <tr>'s
// below the <tbody> below the <table id="foo">
foreach($dom->find('tbl#foo tr') as $tr) { // note the lack of tbody selector
/* you will get all trs, but let's only work with ones with the parent
of a tbody! */
if($tr->parent->tag == 'tbody') { // our workaround
/* this part will work as you would expect the above broken code to work */
}
}
<tbody>
and <thead>
. Be careful -- only look at the actual source -- stay away from the DOM inspectors if you run into unexplainable issues.