问题
The official documentation for the MSG format states
- PidTagStoreSupportMask
indicates whether string properties within the .msg file are Unicode-encoded or not. STORE_UNICODE_OK Set if the string properties are Unicode-encoded.
- PidTagMessageCodepage
specifies the code page used to encode the non-Unicode string properties on this Message object
- PidTagInternetCodepage
indicates the code page used for the PidTagBody property or the PidTagBodyHtml property
Based on the above my understanding is that if the unicode mask is set then all String properties are unicode encoded i.e UTF-16LE If the mask is not set then PidTagMessageCodepage is used to decode all String properties in the message. Based on the documentation non-unicode and unicode properties cannot exist together.
So, what is the purpose of the PidTagInternetCodepage ? It is used to decode the body or bodyhtml which have types ptystring.
If a message has the unicode storemask then
Q1. Do we decode the PidTagBody/PidTagBodyHtml using unicode or PidTagInternetCodepage ?
If a message is non-unicode then
Q2. Do we decode PidTagBody/PidTagBodyHtml using PidTagMessageCodepage or PidTagInternetCodepage ?
Q3. Do we use unicode when storemask is set, and when it is not first attempt PidTagInternetCodepage then PidTagMessageCodepage for PidTagBody/PidTagBodyHtmlit ?
Q4. What do we do if none are present .. default to 1252 ?
回答1:
PR_BODY is not different from any other string property (such as PR_SUBJECT) - it comes in both PT_STRING8 and PT_UNICODE flavors.
PR_HTML, on the other hand, is PT_BINARY and it stores the data in a binary byte blob. Most HTML bodies includes the charset as a part of the HTML headers, but if it is not present, you will need to use PR_INTERNET_CODEPAGE.
来源:https://stackoverflow.com/questions/52998979/msg-clarification-on-pidtaginternetcodepage-pidtagmessagecodepage-pidtagstores