I\'m attempting to run preg_match to extract the SRC attribute from the first IMG tag in an article (in this case, stored in $row->introtext).
preg_match(\'/
The regex I used was much simpler. My code assumes that the string being passed to it contains exactly one img tag with no other markup:
$pattern = '/src="([^"]*)"/';
See my answer here for more info: How to extract img src, title and alt from html using php?
If you need to use preg_match()
itself, try this:
preg_match('/(?<!_)src=([\'"])?(.*?)\\1/',$content, $matches);
Try:
include ("htmlparser.inc"); // from: http://php-html.sourceforge.net/
$html = 'bla <img src="images/stories/otakuzoku1.jpg" border="0" alt="Inside Otakuzoku\'s store" /> noise <img src="das" /> foo';
$parser = new HtmlParser($html);
while($parser->parse()) {
if($parser->iNodeName == 'img') {
echo $parser->iNodeAttributes['src'];
break;
}
}
which will produce:
images/stories/otakuzoku1.jpg
It should work with PHP 4.x.
Here's a way to do it with built-in functions (php >= 4):
$parser = xml_parser_create();
xml_parse_into_struct($parser, $html, $values);
foreach ($values as $key => $val) {
if ($val['tag'] == 'IMG') {
$first_src = $val['attributes']['SRC'];
break;
}
}
echo $first_src; // images/stories/otakuzoku1.jpg
This task should be executed by a dom parser because regex is dom-ignorant.
Code: (Demo)
$row = (object)['introtext' => '<div>test</div><img src="source1"><p>text</p><img src="source2"><br>'];
$dom = new DOMDocument();
$dom->loadHTML($row->introtext);
echo $dom->getElementsByTagName('img')->item(0)->getAttribute('src');
Output:
source1
This says:
Clean, appropriate, easy to read and manage.
Your expression is incorrect. Try:
preg_match('/< *img[^>]*src *= *["\']?([^"\']*)/i', $row->introtext, $matches);
Note the removal of brackets around img and src and some other cleanups.