You absolutely do NOT want to parse HTML with Regex.
There are far too many variations, for one, and more importantly, regex isn't very good with the hierarchal nature of HTML. It's best to use an XML parser or better-yet an HTML-specific parser.
Whenever I need to scrape HTML, I tend to use the Simple HTML DOM Parser library, which takes an HTML tree and parses it into a traversable PHP object, which you can query something like JQuery.
<?php
require 'simplehtmldom/simple_html_dom.php';
$sHtml = <<<EOS
<table border="1" >
<tbody style="" >
<tr style="" >
<td style="color:blue;">
data0
</td>
<td style="font-size:15px;">
data1
</td>
<td style="font-size:15px;">
data2
</td>
<td style="color:blue;">
data3
</td>
<td style="color:blue;">
data4
</td>
</tr>
<tr style="" >
<td style="color:blue;">
data00
</td>
<td style="font-size:15px;">
data11
</td>
<td style="font-size:15px;">
data22
</td>
<td style="color:blue;">
data33
</td>
<td style="color:blue;">
data44
</td>
</tr>
<tr style="color:black" >
<td style="color:blue;">
data000
</td>
<td style="font-size:15px;">
data111
</td>
<td style="font-size:15px;">
data222
</td>
<td style="color:blue;">
data333
</td>
<td style="color:blue;">
data444
</td>
</tr>
</tbody>
</table>
EOS;
$oHTML = str_get_html($sHtml);
$oTRs = $oHTML->find('table tr');
$aData = array();
foreach($oTRs as $oTR) {
$aRow = array();
$oTDs = $oTR->find('td');
foreach($oTDs as $oTD) {
$aRow[] = trim($oTD->plaintext);
}
$aData[] = $aRow;
}
var_dump($aData);
?>
And the output:
array
0 =>
array
0 => string 'data0' (length=5)
1 => string 'data1' (length=5)
2 => string 'data2' (length=5)
3 => string 'data3' (length=5)
4 => string 'data4' (length=5)
1 =>
array
0 => string 'data00' (length=6)
1 => string 'data11' (length=6)
2 => string 'data22' (length=6)
3 => string 'data33' (length=6)
4 => string 'data44' (length=6)
2 =>
array
0 => string 'data000' (length=7)
1 => string 'data111' (length=7)
2 => string 'data222' (length=7)
3 => string 'data333' (length=7)
4 => string 'data444' (length=7)