How can I convert string encoded with Windows Codepage 1251 to a Unicode string

前端 未结 3 1028
耶瑟儿~
耶瑟儿~ 2020-12-20 01:14

The cyrllic string my app receives uses(I believe) the table below: \"enter

said I bel

相关标签:
3条回答
  • 2020-12-20 01:38

    If you are using Delphi 2009 or later, this is done automatically:

    type
      CyrillicString = type AnsiString(1251);
    
    procedure TForm1.FormCreate(Sender: TObject);
    var
      UnicodeStr: string;
      CyrillicStr: CyrillicString;
    begin
      UnicodeStr := 'This is a test.'; // Unicode string
      CyrillicStr := UnicodeStr; // ...converted to 1251
    
      CyrillicStr := 'This is a test.'; // Cryllic string
      UnicodeStr := CyrillicStr; // ...converted to Unicode
    end;
    
    0 讨论(0)
  • 2020-12-20 01:38

    Windows API MultiByteToWideChar() and WideCharToMultiByte() can be used to convert to and from any supported code page in Windows. Of course if you use Delphi >= 2009 it is easier to use the native unicode support.

    0 讨论(0)
  • 2020-12-20 01:42

    First of all I recommend you read Marco Cantù's whitepaper on Unicode in Delphi. I am also assuming from your question (and previous questions), that you are using a Unicode version of Delphi, i.e. D2009 or later.


    You can first of all define an AnsiString with codepage 1251 to match your input data.

    type
      CyrillicString = type Ansistring(1251);
    

    This is an important step. It says that any data contained inside a variable of this type is to be interpreted as having been encoded using the 1251 codepage. This allows Delphi to perform correct conversions to other string types, as we will see later.

    Next copy your input data into a string of this variable.

    function GetCyrillicString(const Input: array of Byte): CyrillicString;
    begin
      SetLength(Result, Length(Input));
      if Length(Result)>0 then
        Move(Input[0], Result[1], Length(Input));
    end;
    

    Of course, there may be other, more convenient ways to get the data in. Perhaps it comes from a stream. Whatever the case, make sure you do it with something equivalent to a memory copy so that you don't invoke code page conversions and thus lose the 1251 encoding.

    Finally you can simply assign a CyrillicString to a plain Unicode string variable and the Delphi runtime performs the necessary conversion automatically.

    function ConvertCyrillicToUnicode(const Input: array of Byte): string;
    begin
      Result := GetCyrillicString(Input);
    end;
    

    The runtime is able to perform this conversion because you specified the codepage when defining CyrillicString and because string maps to UnicodeString which is encoded with UTF-16.

    0 讨论(0)
提交回复
热议问题