I\'ve got a string with HTML attributes:
$attribs = \' id= \"header \" class = \"foo bar\" style =\"background-color:#fff; color: red; \"\';
A simple and effective function to solve this
function attrString2Array($attr) {
$atList = [];
if (preg_match_all('/\s*(?:([a-z0-9-]+)\s*=\s*"([^"]*)")|(?:\s+([a-z0-9-]+)(?=\s*|>|\s+[a..z0-9]+))/i', $attr, $m)) {
for ($i = 0; $i < count($m[0]); $i++) {
if ($m[3][$i])
$atList[$m[3][$i]] = null;
else
$atList[$m[1][$i]] = $m[2][$i];
}
}
return $atList;
}
print_r(attrString2Array('<li data-tpl-classname="class" data-tpl-title="innerHTML" disabled nowrap href="#" hide src = "images/asas.gif">'));
print_r(attrString2Array('data-tpl-classname="class" data-tpl-title="innerHTML" disabled nowrap href="#" hide src = "images/asas.gif"'));
//Array
//(
// [data-tpl-classname] => class
// [data-tpl-title] => innerHTML
// [disabled] =>
// [nowrap] =>
// [href] => #
// [hide] =>
// [src] => images/asas.gif
//)
You can't use a regular expression to parse html-attributes. This is because the syntax is contextual. You can use regular expressions to tokenize the input, but you need a state machine to parse it.
If the performance isn't a big deal, the safest way to do it, is probably to wrap the attributes in a tag and then send it through an html parser. Eg.:
function parse_attributes($input) {
$dom = new DomDocument();
$dom->loadHtml("<foo " . $input. "/>");
$attributes = array();
foreach ($dom->documentElement->attributes as $name => $attr) {
$attributes[$name] = $node->value;
}
return $attributes;
}
You could probably optimize the above, by reusing the parser, or by using XmlReader or the sax parser.
Use SimpleXML:
<?php
$attribs = ' id= "header " class = "foo bar" style ="background-color:#fff; color: red; "';
$x = new SimpleXMLElement("<element $attribs />");
print_r($x);
?>
This assumes that the attributes are always name/value pairs...
Easy way could be also:
$atts_array = current((array) new SimpleXMLElement("<element $attribs />"));
May be this helps you .. What it does ..
http://simplehtmldom.sourceforge.net/
You could use a regular expression to extract that information:
$attribs = ' id= "header " class = "foo bar" style ="background-color:#fff; color: red; "';
$pattern = '/(\\w+)\s*=\\s*("[^"]*"|\'[^\']*\'|[^"\'\\s>]*)/';
preg_match_all($pattern, $attribs, $matches, PREG_SET_ORDER);
$attrs = array();
foreach ($matches as $match) {
if (($match[2][0] == '"' || $match[2][0] == "'") && $match[2][0] == $match[2][strlen($match[2])-1]) {
$match[2] = substr($match[2], 1, -1);
}
$name = strtolower($match[1]);
$value = html_entity_decode($match[2]);
switch ($name) {
case 'class':
$attrs[$name] = preg_split('/\s+/', trim($value));
break;
case 'style':
// parse CSS property declarations
break;
default:
$attrs[$name] = $value;
}
}
var_dump($attrs);
Now you just need to parse the classes of class
(split at whitespaces) and property declarations of style
(a little bit harder as it can contain comments and URLs with ;
in it).