With PHP, how can I isolate the contents of the src attribute from $foo? The end result I\'m looking for would give me just \"http://example.com/img/image.jpg\"
I'm extremely late to this, but I have a simple solution not yet mentioned. Load it with simplexml_load_string
(if you have simplexml enabled) and then flip it through json_encode
and json_decode
.
$foo = '<img class="foo bar test" title="test image" src="http://example.com/img/image.jpg" alt="test image" width="100" height="100" />';
$parsedFoo = json_decode(json_encode(simplexml_load_string($foo)), true);
var_dump($parsedFoo['@attributes']['src']); // output: "http://example.com/img/image.jpg"
$parsedFoo
comes through as
array(1) {
["@attributes"]=>
array(6) {
["class"]=>
string(12) "foo bar test"
["title"]=>
string(10) "test image"
["src"]=>
string(32) "http://example.com/img/image.jpg"
["alt"]=>
string(10) "test image"
["width"]=>
string(3) "100"
["height"]=>
string(3) "100"
}
}
I've been using this for parsing XML and HTML for a few months now and it works pretty well. I've had no hiccups yet, though I haven't had to parse a large file with it (I imagine using json_encode
and json_decode
like that will get slower the larger the input gets). It's convoluted, but it's by far the easiest way to read HTML properties.
I got this code:
$dom = new DOMDocument();
$dom->loadHTML($img);
echo $dom->getElementsByTagName('img')->item(0)->getAttribute('src');
Assuming there is only one img :P
// Create DOM from string
$html = str_get_html('<img class="foo bar test" title="test image" src="http://example.com/img/image.jpg" alt="test image" width="100" height="100" />');
// echo the src attribute
echo $html->find('img', 0)->src;
http://simplehtmldom.sourceforge.net/
You can go around this problem using this function:
function getTextBetween($start, $end, $text) { $start_from = strpos($text, $start); $start_pos = $start_from + strlen($start); $end_pos = strpos($text, $end, $start_pos + 1); $subtext = substr($text, $start_pos, $end_pos); return $subtext; }
$foo = '<img class="foo bar test" title="test image" src="http://example.com/img/image.jpg" alt="test image" width="100" height="100" />';
$img_src = getTextBetween('src="', '"', $foo);
Here's what I ended up doing, although I'm not sure about how efficient this is:
$imgsplit = explode('"',$data);
foreach ($imgsplit as $item) {
if (strpos($item, 'http') !== FALSE) {
$image = $item;
break;
}
}
If you don't wish to use regex (or any non-standard PHP components), a reasonable solution using the built-in DOMDocument class would be as follows:
<?php
$doc = new DOMDocument();
$doc->loadHTML('<img src="http://example.com/img/image.jpg" ... />');
$imageTags = $doc->getElementsByTagName('img');
foreach($imageTags as $tag) {
echo $tag->getAttribute('src');
}
?>