I\'m creating a CSS editor and am trying to create a regular expression that can get data from a CSS document. This regex works if I have one property but I can\'t get it to
I wrote a piece of code that easily parses CSS. All you have to do is do a couple of explodes really... The $css variable is a string of the CSS. All you have to do is do a print_r($css)
to get a nice array of CSS, fully parsed.
$css_array = array(); // master array to hold all values
$element = explode('}', $css);
foreach ($element as $element) {
// get the name of the CSS element
$a_name = explode('{', $element);
$name = $a_name[0];
// get all the key:value pair styles
$a_styles = explode(';', $element);
// remove element name from first property element
$a_styles[0] = str_replace($name . '{', '', $a_styles[0]);
// loop through each style and split apart the key from the value
$count = count($a_styles);
for ($a=0;$a<$count;$a++) {
if ($a_styles[$a] != '') {
$a_key_value = explode(':', $a_styles[$a]);
// build the master css array
$css_array[$name][$a_key_value[0]] = $a_key_value[1];
}
}
}
Gives you this:
Array
(
[body] => Array
(
[background] => #f00
[font] => 12px arial
)
)
Try this
function trimStringArray($stringArray){
$result = array();
for($i=0; $i < count($stringArray); $i++){
$trimmed = trim($stringArray[$i]);
if($trimmed != '') $result[] = $trimmed;
}
return $result;
}
$regExp = '/\{|\}/';
$rawCssData = preg_split($regExp, $style);
$cssArray = array();
for($i=0; $i < count($rawCssData); $i++){
if($i % 2 == 0){
$cssStyle['selectors'] = array();
$selectors = split(',', $rawCssData[$i]);
$cssStyle['selectors'] = trimStringArray($selectors);
}
if($i % 2 == 1){
$attributes = split(';', $rawCssData[$i]);
$cssStyle['attributes'] = trimStringArray($attributes);
$cssArray[] = $cssStyle;
}
}
//return false;
echo '<pre>'."\n";
print_r($cssArray);
echo '</pre>'."\n";
You are trying to pull structure out of the data, and not just individual values. Regular expressions might could be painfully stretched to do the job, but you are really entering parser territory, and should be pulling out the big guns, namely parsers.
I have never used the PHP parser generating tools, but they look okay after a light scan of the docs. Check out LexerGenerator and ParserGenerator. LexerGenerator will take a bunch of regular expressions describing the different types of tokens in a language (in this case, CSS) and spit out some code that recognizes the individual tokens. ParserGenerator will take a grammar, a description of what things in a language are made up of what other things, and spit out a parser, code that takes a bunch of tokens and returns a syntax tree (the data structure that you are after.
I would recommend against using regex's to parse CSS - especially in single regex!
If you insist on doing the parsing in regex's, split it up into sensible sections - use one regex to split all the body{..}
blocks, then another to parse the color:rgb(1,2,3);
attributes.
If you are actually trying to write something "useful" (not trying to learn regular expressions), look for a prewritten CSS parser.
I found this cssparser.php which seems to work very well:
$cssp = new cssparser;
$cssp -> ParseStr("body { background: #f00;font: 12px Arial; }");
print_r($cssp->css);
..which outputs the following:
Array
(
[body] => Array
(
[background] => #f00
[font] => 12px arial
)
)
The parser is pretty simple, so should be easy to work out what it's doing. Oh, I had to remove the lines that read if($this->html) {$this->Add("VAR", "");}
(it seems to be a debugging thing that was left in)
I've mirrored the script here, with the above changes in
I am using the regex below and it pretty much works... of course this question is old now and I see that you've abandoned your efforts... but in case someone else runs across it:
(?<selector>(?:(?:[^,{]+),?)*?)\{(?:(?<name>[^}:]+):?(?<value>[^};]+);?)*?\}
(hafta remove all of the /* comments */ from your CSS first to be safe)
That just seems too convoluted for a single regular expression. Well, I'm sure that with the right extentions, an advanced user could create the right regex. But then you'd need an even more advanced user to debug it.
Instead, I'd suggest using a regex to pull out the pieces, and then tokenising each piece separately. e.g.,
/([^{])\s*\{\s*([^}]*?)\s*}/
Then you end up with the selector and the attributes in separate fields, and then split those up. (Even the selector will be fun to parse.) Note that even this will have pains if }'s can appear inside quotes or something. You could, again, convolute the heck out of it to avoid that, but it's probably even better to avoid regex's altogether here, and handle it by parsing one field at a time, perhaps by using a recursive-descent parser or yacc/bison or whatever.