COM methods, Char type and CharSet

后端 未结 1 407
忘了有多久
忘了有多久 2021-01-14 22:56

This is a follow-up to my previous question: Does .NET interop copy array data back and forth, or does it pin the array?

My method is a COM interface method (rather

1条回答
  •  囚心锁ツ
    2021-01-14 23:36

    I think this is a good question, and the char (System.Char) interop behavior does deserve some attention.

    In managed code, sizeof(char) is always equal 2 (two bytes), because in .NET characters are always Unicode.

    Nevertheless, the marshaling rules differ between cases when char for P/Invoke (calling an exported DLL API) and COM (calling a COM interface method).

    For P/Invoke, CharSet can be used explictly with any [DllImport] attribute, or implicitly via [module|assembly: DefaultCharSet(CharSet.Auto|Ansi|Unicode)], to change the default setting for all [DllImport] declarations per module or per assembly.

    The default value is CharSet.Ansi, which means there will be Unicode-to-ANSI conversion. I ussualy change the default to Unicode with [module: DefaultCharSet(CharSet.Unicode)], and then selectively use [DllImport(CharSet = CharSet.Ansi)] in those rare case where I need call an ANSI API.

    It is also possible to alter any specific char-typed parameter with MarshalAs(UnmanagedType.U1|U2) or MarshalAs(UnmanagedType.LPArray, ArraySubType = UnmanagedType.U1|U2) (for a char[] parameter). E.g., you may have something like this:

    [DllImport("Test.dll", ExactSpelling = true, CharSet = CharSet.Unicode)]
    static extern bool TestApi(
        int length,
        [In, Out, MarshalAs(UnmanagedType.LPArray] char[] buff1,
        [In, Out, MarshalAs(UnmanagedType.LPArray,
            ArraySubType = UnmanagedType.U1)] char[] buff2); 
    

    In this case, buff1 will be passed as an array of double-byte values (as is), but buff2 will be converted to and from an array of single byte values. Note, this still will be a smart, Unicode-to-OS-current-code-page (and back) conversion for buff2. E.g, a Unicode '\x20AC' () will become \x80 in the unmanaged code (rovided the OS code page is Windows-1252). This is how marshalling of MarshalAs(UnmanagedType.LPArray, ArraySubType = UnmanagedType.U1)] char[] buff would be different from MarshalAs(UnmanagedType.LPArray, ArraySubType = UnmanagedType.U1)] ushort[] buff. For ushort, 0x20AC would be simply converted to 0xAC.

    For calling a COM interface method, the story is quite different. There, char is always treated as a double-byte value representing a Unicode character. Perhaps, the reason for such design decision could be implied from Don Box's "Essential COM" (quoting the footnote from this page):

    The OLECHAR type was chosen in favor of the common TCHAR data type used by the Win32 API to alleviate the need to support two versions of each interface (CHAR and WCHAR). By supporting only one character type, object developers are decoupled from the state of the UNICODE preprocessor symbol used by their clients.

    Apparently, the same concept made its way to .NET. I'm pretty confident this is true even for legacy ANSI platforms (like Windows 95, where Marshal.SystemDefaultCharSize == 1).

    Note that DefaultCharSet doesn't have any effect on char when it's a part of the COM interface method signature. Neither there is a way to apply CharSet explicitly. However, you still have full control over the marshaling behavior of each individual parameter with MarshalAs, in exactly the same way as for P/Invoke above. E.g., your Next method might look like below, in case the unmanaged COM code expects a buffer of ANSI characters:

    void Next(ref int pcch,
        [In, Out, MarshalAs(UnmanagedType.LPArray, 
            ArraySubType = UnmanagedType.U1, SizeParamIndex = 0)] char [] pchText);  
    

    0 讨论(0)
提交回复
热议问题