问题
I am trying to parse a FDF file using PHP, and regex. But I just cant get my head around regex. I am stuck parsing the file to generate a array.
%FDF-1.2
%âãÏÓ
1 0 obj
<<
/FDF
<<
/Fields [
<<
/V (email@email.com)
/T (field_email)
>>
<<
/V (John)
/T (field_name)
>>
<<
/V ()
/T (field_reference)
>>]
>>
>>
endobj
trailer
<<
/Root 1 0 R
>>
%%EOF
Current function (source:http://php.net/manual/en/ref.fdf.php)
function parse2($file) {
if (!preg_match_all("/<<\s*\/V([^>]*)>>/x", $file,$out,PREG_SET_ORDER))
return;
for ($i=0;$i<count($out);$i++) {
$pattern = "<<.*/V\s*(.*)\s*/T\s*(.*)\s*>>";
$thing = $out[$i][1];
if (eregi($pattern,$out[$i][0],$regs)) {
$key = $regs[2];
$val = $regs[1];
$key = preg_replace("/^\s*\(/","",$key);
$key = preg_replace("/\)$/","",$key);
$key = preg_replace("/\\\/","",$key);
$val = preg_replace("/^\s*\(/","",$val);
$val = preg_replace("/\)$/","",$val);
$matches[$key] = $val;
}
}
return $matches;
}
Result:
Array
(
[field_email)
] => email@email.com)
[field_name)
] => John)
[field_reference)
] => )
)
Why does it conclude the )
and new line? I know this problem is trivial for someone that understands regex expressions. So help would be appreciated.
回答1:
Description
Your initial expression simply finds the entire block of text which represents each key and value set. Then in your clean up section, you're looking for a close paran which is followed immediately by a end of string \)$
but I'm sure there are additional characters between the close paran and the end of the string.
Instead I'd handle all this in one operation. This expression will:
- find the field value
- trim the surrounding parens off
- and place into capture group 1
- find the name of the value and place into capture group 2
- trim the
field_
substring off - trim the surrounding parens off
- and place into capture group 2
- trim the
- requires the options: case insensitive, and multi-line
^\/V\s\(([^)]*)\)[\r\n]*^\/T\s\(field_([^)]*)\)
Example
Live Demo
Sample Text
%FDF-1.2
%âãÏÓ
1 0 obj
<<
/FDF
<<
/Fields [
<<
/V (email@email.com)
/T (field_email)
>>
<<
/V (John)
/T (field_name)
>>
<<
/V ()
/T (field_reference)
>>]
>>
>>
endobj
trailer
<<
/Root 1 0 R
>>
%%EOF
Matches
[0][0] = /V (email@email.com)
/T (field_email)
[0][1] = email@email.com
[0][2] = email
[1][0] = /V (John)
/T (field_name)
[1][1] = John
[1][2] = name
[2][0] = /V ()
/T (field_reference)
[2][1] =
[2][2] = reference
Or
If you wanted retain the field_
substring, then you can simply remove that from the expression like so:
^\/V\s\(([^)]*)\)[\r\n]*^\/T\s\(([^)]*)\)
来源:https://stackoverflow.com/questions/18161984/php-regex-code-to-extract-fdf-data