问题
I regularly receive a generated email message containing a text part and a text attachment. I want to test if attachment is base64 encoded, then decode it like:
:0B
* ^(Content-Transfer-Encoding: *base64(($)[a-z0-9].*)*($))
{
msgID=`printf '%s' "$MATCH" | base64 -d`
}
But it always say invalid input, anyone know what's wrong?
procmail: Match on "^()\/[a-z]+[0-9]+[^\+]"
procmail: Assigning "msgID=PGh0b"
procmail: matched "^(Content-Disposition: *attachment.*(($)[a-z0-9].*)* |Content-Transfer-Encoding: *base64(($)[a-z0-9].*)*($)"
procmail: Executing "printf '%s' "$MATCH" | base64 -d"
base64: invalid input
procmail: Assigning "msgID=<ht"
procmail: Unexpected EOL
procmail: Assigning "msgID=PGh0b"
procmail: Match on "^(Content-Transfer-Encoding: *base64(($)[a-z0-9].*)*($))"
procmail: Executing "printf '%s' "$MATCH" | base64 -d"
base64: invalid input
procmail: Assigning "msgID=<ht"
procmail: Unexpected EOL
回答1:
If your requirements are complex, it might be easier to write a dedicated script which extracts the information you want -- a modern scripting language with proper MIME support is going to be a lot more versatile when it comes to all the myriad different possibilities for content encoding and body part structure in modern MIME email.
The following finds the first occurrence of MIME headers with Content-Disposition: attachment
and extracts the first token of the following body. This might do what you want if you are corresponding with a sender who uses a well-defined, static template. There is no real MIME parsing here, so (say) a forwarded message which happens to contain an embedded part which matches the pattern will also trigger the conditions. (This can be a bug, or a feature.)
A useful but not frequently used feature of Procmail is the ability to write a regular expression which spans multiple lines. Within a regex, ($)
always matches a literal newline. So with that, we can look for a Content-Disposition: attachment
header followed by other headers (zero or more) followed by an empty line, followed by the token you want to extract.
:0B
* ^Content-Disposition: *attachment.*(($)[A-Z].*)*($)($)\/[A-Z]+[0-9]+
{ msgid="$MATCH" }
For simplicity, I have not attempted to cope with multi-line MIME headers. If you want to support that, the fix should be reasonably obvious, though not at all elegant.
In the somewhat more general case, you might want to add a condition to check that the group of MIME headers in the condition also contains a Content-type: text/plain
; you will need to set up two alternatives for having Content-type:
before or after Content-disposition:
(or somehow normalize the MIME headers before getting to this recipe; or trust that the sender always generates them in exactly the order in the sample message).
来源:https://stackoverflow.com/questions/32292295/extract-text-from-content-disposition-attachment-body-part