Extracting Site data through Web Crawler outputs error due to mis-match of Array Index

后端未结

关注

 4  801

再見小時候 2021-01-11 13:10

I been trying to extract site table text along with its link from the given table to (which is in site1.com) to my php page using a web crawler.

But unfortunately,

4条回答

轻奢々 (楼主)

2021-01-11 13:28

Chopping at html with string functions or regex is not a reliable method. DomDocument and Xpath do a nice job.

Code: (Demo)

$dom=new DOMDocument; 
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->evaluate("//td[@class = 'FootNotes2']/a") as $node) {  // target a tags that have  as parent
    $result[]=['href' => $node->getAttribute('href'), 'text' => $node->nodeValue];  // extract/store the href and text values
    if (sizeof($result) == 10) { break; }  // set a limit of 10 rows of data
}
if (isset($result)) {
    echo "\n";
    foreach ($result as $data) {
        echo "\t{$data['text']}\n";
    }
    echo "";
}

Sample Input:

$html = <<

     
    Subject
     
    Last Update
     
    Replies
     
    Views


     
    Serious dedicated study partner for U World - step12013
     
    02/11/17 01:50
     
    10
     
    318


     
    some text - step12013
     
    02/11/17 01:50
     
    10
     
    318



HTML;

Output:


    Serious dedicated study partner for U World
    some text

0 讨论(0)

查看其它4个回答