Get data only from html table used preg_match_all in php

前端 未结 2 1436
暖寄归人
暖寄归人 2021-01-03 16:07

I have a html table like this :

string...
相关标签:
2条回答
  • 2021-01-03 16:18

    PHP has a native extension to parse HTML and XML with DOM:

    $dom = new DOMDocument;
    $dom->loadHTML( $htmlContent );
    $rows = array();
    foreach( $dom->getElementsByTagName( 'tr' ) as $tr ) {
        $cells = array();
        foreach( $tr->getElementsByTagName( 'td' ) as $td ) {
            $cells[] = $td->nodeValue;
        }
        $rows[] = $cells;
    }
    

    Adjust to your liking. Search StackOverflow or have a look at the PHP Manual or go through some of my answers to learn more about it's usage.

    0 讨论(0)
  • 2021-01-03 16:20

    You absolutely do NOT want to parse HTML with Regex.

    There are far too many variations, for one, and more importantly, regex isn't very good with the hierarchal nature of HTML. It's best to use an XML parser or better-yet an HTML-specific parser.

    Whenever I need to scrape HTML, I tend to use the Simple HTML DOM Parser library, which takes an HTML tree and parses it into a traversable PHP object, which you can query something like JQuery.

    <?php
        require 'simplehtmldom/simple_html_dom.php';
    
        $sHtml = <<<EOS
        <table border="1" >
          <tbody style="" >
               <tr style="" > 
                     <td style="color:blue;">
                          data0
                      </td>
                        <td style="font-size:15px;">
                         data1
                      </td>
                        <td style="font-size:15px;">
                          data2
                      </td>
                        <td style="color:blue;">
                          data3
                      </td>
                        <td style="color:blue;">
                          data4
                      </td>
               </tr>
               <tr style="" > 
                     <td style="color:blue;">
                          data00
                      </td>
                        <td style="font-size:15px;">
                         data11
                      </td>
                        <td style="font-size:15px;">
                          data22
                      </td>
                        <td style="color:blue;">
                          data33
                      </td>
                        <td style="color:blue;">
                          data44
                      </td>
               </tr>
               <tr style="color:black" > 
                     <td style="color:blue;">
                          data000
                      </td>
                        <td style="font-size:15px;">
                         data111
                      </td>
                        <td style="font-size:15px;">
                          data222
                      </td>
                        <td style="color:blue;">
                          data333
                      </td>
                        <td style="color:blue;">
                          data444
                      </td>
               </tr>
          </tbody>
        </table>
    EOS;
    
        $oHTML = str_get_html($sHtml);
        $oTRs = $oHTML->find('table tr');
        $aData = array();
        foreach($oTRs as $oTR) {
            $aRow = array();
            $oTDs = $oTR->find('td');
    
            foreach($oTDs as $oTD) {
                $aRow[] = trim($oTD->plaintext);
            }
    
            $aData[] = $aRow;
        }
    
        var_dump($aData);
    ?>
    

    And the output:

    array
      0 => 
        array
          0 => string 'data0' (length=5)
          1 => string 'data1' (length=5)
          2 => string 'data2' (length=5)
          3 => string 'data3' (length=5)
          4 => string 'data4' (length=5)
      1 => 
        array
          0 => string 'data00' (length=6)
          1 => string 'data11' (length=6)
          2 => string 'data22' (length=6)
          3 => string 'data33' (length=6)
          4 => string 'data44' (length=6)
      2 => 
        array
          0 => string 'data000' (length=7)
          1 => string 'data111' (length=7)
          2 => string 'data222' (length=7)
          3 => string 'data333' (length=7)
          4 => string 'data444' (length=7)
    
    0 讨论(0)
提交回复
热议问题