Parsing e-mail-like headers (similar to RFC822)

前端 未结 2 520
猫巷女王i
猫巷女王i 2021-01-20 14:51

Problem / Question

There is a database of bot information that I would like to parse. It is said to be similar to RFC822 messages.

Before I re-invent the

2条回答
  •  清歌不尽
    2021-01-20 15:43

    The message MIME type is pretty common. Parsers exist plenty, but are commonly hard to google. Personally I resort to regex here, if the format is somewhat consistent.

    For example these two will do the trick:

      // matches a consecutive RFC821 style key:value list
    define("RX_RFC821_BLOCK", b"/(?:^\w[\w.-]*\w:.*\R(?:^[ \t].*\R)*)++\R*/m");
    
      // break up Key: value lines
    define("RX_RFC821_SPLIT", b"/^(\w+(?:[-.]?\w+)*)\s*:\s*(.*\n(?:^[ \t].*\n)*)/m");
    

    Number one breaks out coherent blocks of message/* lines, and the second can be used to split up each such block. It needs post-processing to strip leading indendation from continued value lines though.

提交回复
热议问题