Parsing e-mail-like headers (similar to RFC822)

ⅰ亾dé卋堺 提交于 2019-12-02 01:58:51

Assuming that $data contains the sample data you pasted above, here is the parser:

<?php

/* 
 * $data = <<<'DATA'
 * <put-sample-data-here>
 * DATA;
 *
 */

$parsed  = array();
$blocks  = preg_split('/\n\n/', $data);
$lines   = array();
$matches = array();
foreach ($blocks as $i => $block) {
    $parsed[$i] = array();
    $lines = preg_split('/\n(([\w.-]+)\: *((.*\n\s+.+)+|(.*(?:\n))|(.*))?)/',
                        $block, -1, PREG_SPLIT_DELIM_CAPTURE);
    foreach ($lines as $line) {
        if(preg_match('/^\n?([\w.-]+)\: *((.*\n\s+.+)+|(.*(?:\n))|(.*))?$/',
                      $line, $matches)) {
            $parsed[$i][$matches[1]] = preg_replace('/\n +/', ' ',
                                                    trim($matches[2]));
        }
    }
}

print_r($parsed);

The message MIME type is pretty common. Parsers exist plenty, but are commonly hard to google. Personally I resort to regex here, if the format is somewhat consistent.

For example these two will do the trick:

  // matches a consecutive RFC821 style key:value list
define("RX_RFC821_BLOCK", b"/(?:^\w[\w.-]*\w:.*\R(?:^[ \t].*\R)*)++\R*/m");

  // break up Key: value lines
define("RX_RFC821_SPLIT", b"/^(\w+(?:[-.]?\w+)*)\s*:\s*(.*\n(?:^[ \t].*\n)*)/m");

Number one breaks out coherent blocks of message/* lines, and the second can be used to split up each such block. It needs post-processing to strip leading indendation from continued value lines though.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!