Exception with German Umlaut characters in TMemIniFile.Create

前端 未结 3 1947
栀梦
栀梦 2021-01-25 13:52

I have an .URL file which contains the following text which contains a German Umlaut character:

[InternetShortcut]
URL=http://edn.embarcadero.com/ar

相关标签:
3条回答
  • 2021-01-25 13:58

    It is not possible, in general, to auto-detect the encoding of a file from its contents.

    A clear demonstration of this is given by this article from Raymond Chen: The Notepad file encoding problem, redux. Raymond uses the example of a file containing these two bytes:

    D0 AE
    

    Raymond goes on to show that this is a well formed file with the following four encodings: ANSI 1252, UTF-8, UTF-16BE and UTF-16LE.

    The take home lesson here is that you have to know the encoding of your file. Either agree it by convention with whoever writes the file. Or enforce the presence of a BOM.

    0 讨论(0)
  • 2021-01-25 14:07

    You need to decide on what the encoding of the file is, once and for all. There's no fool proof way to auto-detect this, so you'll have to enforce it from your code that creates these files.

    If the creation of this file is outside your control, then you are more or less out of luck. You can try to rely of the BOM (Byte-Order-Mark) at the beginning of the file (which should be there if it is a UTF-8 file). I can't see from the specification of the TMemIniFile what the CREATE constructor without an encoding parameter assumes about the encoding of the file (my guess is that it follows the BOM and if there's no such thing, it assumes ANSI, ie. system codepage).

    One thing you can do - if you decide to stick to your current method - is to change your code to:

    procedure TForm1.Button1Click(Sender: TObject);
    var
      BookmarkIni: TCustomIniFile;
    begin
      // The error occurs here:
      try
        BookmarkIni := TMemIniFile.Create('F:\Bug fix list for RAD Studio XE8.url',
                                        TEncoding.UTF8);
      except
        BookmarkIni := TIniFile.Create('F:\Bug fix list for RAD Studio XE8.url');
      end;
      try
        // Some code here
      finally
        BookmarkIni.Free;
      end;
    end;
    

    You don't need two separate variables, as both TIniFile and TMemIniFile (as well as TRegistryIniFile) all have a common ancestor: TCustomIniFile. By declaring your variable as this common ancestor, you can instantiate (create) it as any of the class types that inherit from TCustomIniFile. The actual (run-time) type is determined depending on which construtcor you're calling to create.

    But first, you should try to use

    BookmarkIni := TMemIniFile.Create('F:\Bug fix list for RAD Studio XE8.url');
    

    ie. without any encoding specified, and see if it works with both ANSI and UTF-8 files.

    EDIT: Here's a test program to verify my claim made in the comments:

    program Project21;
    
    {$APPTYPE CONSOLE}
    
    uses
      IniFiles, System.SysUtils;
    
    const
      FileName = 'F:\Bug fix list for RAD Studio XE8.url';
    
    var
      TXT : TextFile;
    
    procedure Test;
    var
      BookmarkIni: TCustomIniFile;
    begin
      try
        BookmarkIni := TMemIniFile.Create(FileName,TEncoding.UTF8);
      except
        BookmarkIni := TIniFile.Create(FileName);
      end;
      try
        Writeln(BookmarkIni.ReadString('MyApp','Notes','xxx'))
      finally
        BookmarkIni.Free;
      end;
    end;
    
    begin
      try
        AssignFile(TXT,FileName); REWRITE(TXT);
        try
          WRITELN(TXT,'[InternetShortcut]');
          WRITELN(TXT,'URL=http://edn.embarcadero.com/article/44358');
          WRITELN(TXT,'[MyApp]');
          WRITELN(TXT,'Notes=The German a umlaut consists of the following two ANSI characters: '#$C3#$A4);
          WRITELN(TXT,'Icon=default');
          WRITELN(TXT,'Title=Bug fix list for RAD Studio XE8');
        finally
          CloseFile(TXT)
        end;
        Test;
        ReadLn
      except
        on E: Exception do
          Writeln(E.ClassName, ': ', E.Message);
      end;
    end.
    
    0 讨论(0)
  • 2021-01-25 14:10

    The rule of thumb - to read data (file, stream whatever) correctly you must know the encoding! And the best solution is to let user to choose encoding or force one e.g. utf-8.

    Moreover, the information ANSI does make things easier without code page.

    A must read - The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

    Other approach is to try to detect encoding (like browsers do with sites if no encoding specified). Detecting UTF is relatively easy if BOM exists, but more often is omitted. Take a look Mozilla's universalchardet or chsdet.

    0 讨论(0)
提交回复
热议问题