I\'m trying to match the highlighted parts of this string:
You should use a DOM parser for that. Here's an example with DOMDocument :
<?php
$document = new DOMDocument();
$document->loadHTML(file_get_contents('yourFileNameHere.html'));
$lst = $document->getElementsByTagName('iframe');
for ($i=0; $i<$lst->length; $i++) {
$iframe= $lst->item($i);
echo $iframe->attributes->getNamedItem('src')->value, '<br />';
}
?>
<?php
$html='<iframe maybe somethin gere src="http://some.random.url.com/" and blablabla';
preg_match('|<iframe [^>]*(src="[^"]+")[^>]*|', $html, $matches);
var_dump($matches);
Output:
array(2) {
[0]=>
string(75) "<iframe maybe somethin gere src="http://some.random.url.com/" and blablabla"
[1]=>
string(33) "src="http://some.random.url.com/""
}
But this is a quick way to do this using regular expression, which may break with unclean html or cause problems, go for a dom parser for a good proof solution.
A regular expression is going to be the cleanest way to do it:
preg_match('<iframe.+?src="(.+?)".+?<\/iframe>', $iframe);
print_r($iframe);
array([0] => whole reg ex match, [1] => your src url);
You should use a DOM parser, but this regex would get you started if there is a reason you must use regexes
.*(?<iframeOpening><iframe)\s[^>]*(?<iframeSrc>src=['"][^>'"]+['"]?).*
It uses named capture groups by the way, here's how they work
preg_match('/.*(?<iframeOpening><iframe)\s[^>]*src=[\'"](?<iframeSrc>[^>\'"])+[\'"]?.*/', $searchText, $groups);
print_r($groups['iframeSrc']);
If youre source is well formed xml you can also use xpath to find the string.
<?php
$file = simplexml_load_file("file.html");
$result = $file->xpath("//iframe[@src]/@src");
?>
see RegEx match open tags except XHTML self-contained tags
That said, your particular situation isn't really parsing... just string matching. Methods for that have already been enumerated before my answer here...