PHP regex code to extract FDF data

别来无恙 提交于 2019-12-08 02:15:21

问题


I am trying to parse a FDF file using PHP, and regex. But I just cant get my head around regex. I am stuck parsing the file to generate a array.

%FDF-1.2
%âãÏÓ
1 0 obj 
<<
/FDF 
<<
/Fields [
<<
/V (email@email.com)
/T (field_email)
>> 
<<
/V (John)
/T (field_name)
>> 
<<
/V ()
/T (field_reference)
>>]
>>
>>
endobj 
trailer

<<
/Root 1 0 R
>>
%%EOF

Current function (source:http://php.net/manual/en/ref.fdf.php)

function parse2($file) {
 if (!preg_match_all("/<<\s*\/V([^>]*)>>/x", $file,$out,PREG_SET_ORDER))
         return;
 for ($i=0;$i<count($out);$i++) {
         $pattern = "<<.*/V\s*(.*)\s*/T\s*(.*)\s*>>";
         $thing = $out[$i][1];
         if (eregi($pattern,$out[$i][0],$regs)) {
                 $key = $regs[2];
                 $val = $regs[1];
                 $key = preg_replace("/^\s*\(/","",$key);
                 $key = preg_replace("/\)$/","",$key);
                 $key = preg_replace("/\\\/","",$key);
                 $val = preg_replace("/^\s*\(/","",$val);
                 $val = preg_replace("/\)$/","",$val);
                 $matches[$key] = $val;
         }
 }
 return $matches;
}

Result:

Array
(
    [field_email)
    ] => email@email.com)

    [field_name)
    ] => John)

    [field_reference)
    ] => )

)

Why does it conclude the ) and new line? I know this problem is trivial for someone that understands regex expressions. So help would be appreciated.


回答1:


Description

Your initial expression simply finds the entire block of text which represents each key and value set. Then in your clean up section, you're looking for a close paran which is followed immediately by a end of string \)$ but I'm sure there are additional characters between the close paran and the end of the string.

Instead I'd handle all this in one operation. This expression will:

  • find the field value
    • trim the surrounding parens off
    • and place into capture group 1
  • find the name of the value and place into capture group 2
    • trim the field_ substring off
    • trim the surrounding parens off
    • and place into capture group 2
  • requires the options: case insensitive, and multi-line

^\/V\s\(([^)]*)\)[\r\n]*^\/T\s\(field_([^)]*)\)

Example

Live Demo

Sample Text

%FDF-1.2
%âãÏÓ
1 0 obj 
<<
/FDF 
<<
/Fields [
<<
/V (email@email.com)
/T (field_email)
>> 
<<
/V (John)
/T (field_name)
>> 
<<
/V ()
/T (field_reference)
>>]
>>
>>
endobj 
trailer

<<
/Root 1 0 R
>>
%%EOF

Matches

[0][0] = /V (email@email.com)
/T (field_email)
[0][1] = email@email.com
[0][2] = email

[1][0] = /V (John)
/T (field_name)
[1][1] = John
[1][2] = name

[2][0] = /V ()
/T (field_reference)
[2][1] = 
[2][2] = reference



Or

If you wanted retain the field_ substring, then you can simply remove that from the expression like so:

^\/V\s\(([^)]*)\)[\r\n]*^\/T\s\(([^)]*)\)



来源:https://stackoverflow.com/questions/18161984/php-regex-code-to-extract-fdf-data

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!