Extracting Site data through Web Crawler outputs error due to mis-match of Array Index

后端 未结 4 801
再見小時候
再見小時候 2021-01-11 13:10

I been trying to extract site table text along with its link from the given table to (which is in site1.com) to my php page using a web crawler.

But unfortunately,

4条回答
  •  轻奢々
    轻奢々 (楼主)
    2021-01-11 13:28

    Chopping at html with string functions or regex is not a reliable method. DomDocument and Xpath do a nice job.

    Code: (Demo)

    $dom=new DOMDocument; 
    $dom->loadHTML($html);
    $xpath = new DOMXPath($dom);
    foreach ($xpath->evaluate("//td[@class = 'FootNotes2']/a") as $node) {  // target a tags that have  as parent
        $result[]=['href' => $node->getAttribute('href'), 'text' => $node->nodeValue];  // extract/store the href and text values
        if (sizeof($result) == 10) { break; }  // set a limit of 10 rows of data
    }
    if (isset($result)) {
        echo "";
    }
    

    Sample Input:

    $html = <<
    
         
        Subject
         
        Last Update
         
        Replies
         
        Views
    
    
         
        Serious dedicated study partner for U World - step12013
         
        02/11/17 01:50
         
        10
         
        318
    
    
         
        some text - step12013
         
        02/11/17 01:50
         
        10
         
        318
    
    
    
    HTML;
    

    Output:

    
    

提交回复
热议问题