I have many emails coming in from different sources. they all have attachments, many of them have attachment names in chinese, so these names are converted to base64 by thei
Question: """Also I actually need to know what type of file it is ie .xls or .doc so I do need to decode the filename in order to correctly process the attachment, but as above, seems gb2312 is not supported in jython, know any roundabouts?"""
Data:
Content-Type: application/vnd.ms-excel;
name="=?gb2312?B?uLGxvmhlbrixsb5nLnhscw==?="
Observations:
(1) The first line indicates Microsoft Excel, so .xls
is looking better than .doc
(2)
>>> import base64
>>> base64.b64decode("uLGxvmhlbrixsb5nLnhscw==")
'\xb8\xb1\xb1\xbehen\xb8\xb1\xb1\xbeg.xls'
>>>
(a) The extension appears to be .xls
-- no need for a gb2312
codec
(b) If you want a file-system-safe file name, you could use the "-_" variant of base64 OR you could percent-encode it
(c) For what it's worth, the file name is XYhenXYg.xls
where X and Y are 2 Chinese characters that together mean "copy" and the remainder are literal ASCII characters.