case insensitive Pos

前端 未结 9 1514
一个人的身影
一个人的身影 2021-02-02 00:56

Is there any comparable function like Pos that is not case-sensitive in D2010 (unicode)?

I know I can use Pos(AnsiUpperCase(FindString), AnsiUpperCase(SourceString)) but

相关标签:
9条回答
  • 2021-02-02 01:01

    Instead 'AnsiUpperCase' you can use Table it is much faster. I have reshape my old code. It is very simple and also very fast. Check it:

    type
      TAnsiUpCaseTable = array [AnsiChar] of AnsiChar;
    
    var
      AnsiTable: TAnsiUpCaseTable;
    
    procedure InitAnsiUpCaseTable(var Table: TAnsiUpCaseTable);
    var
      n: cardinal;
    begin
      for n := 0 to SizeOf(TAnsiUpCaseTable) -1 do
      begin
        AnsiTable[AnsiChar(n)] := AnsiChar(n);
        CharUpperBuff(@AnsiTable[AnsiChar(n)], 1);
      end;
    end;
    
    function UpCasePosEx(const SubStr, S: string; Offset: Integer = 1): Integer;
    var
      n              :integer;
      SubStrLength   :integer;
      SLength        :integer;
    label
      Fail;
    begin
      SLength := length(s);
      if (SLength > 0) and (Offset > 0) then begin
        SubStrLength := length(SubStr);
        result := Offset;
        while SubStrLength <= SLength - result + 1 do begin
          for n := 1 to SubStrLength do
            if AnsiTable[SubStr[n]] <> AnsiTable[s[result + n -1]] then
              goto Fail;
          exit;
    Fail:
          inc(result);
        end;
      end;
      result := 0;
    end;
    
    initialization
      InitAnsiUpCaseTable(AnsiTable);
    end.
    
    0 讨论(0)
  • 2021-02-02 01:03

    I have also faced the problem of converting FastStrings, which used a Boyer-Moore (BM) search to gain some speed, for D2009 and D2010. Since many of my searches are looking for a single character only, and most of these are looking for non-alphabetic characters, my D2010 version of SmartPos has an overload version with a widechar as the first argument, and does a simple loop through the string to find these. I use uppercasing of both arguments to handle the few non-case-sensitive case. For my applications, I believe the speed of this solution is comparable to FastStrings.

    For the 'string find' case, my first pass was to use SearchBuf and do the uppercasing and accept the penalty, but I have recently been looking into the possibility of using a Unicode BM implementation. As you may be aware, BM does not scale well or easily to charsets of Unicode proportions, but there is a Unicode BM implementation at Soft Gems. This pre-dates D2009 and D2010, but looks as if it would convert fairly easily. The author, Mike Lischke, solves the uppercasing issue by including a 67kb Unicode uppercasing table, and this may be a step too far for my modest requirements. Since my search strings are usually short (though not as short as your single three-character example) the overhead for Unicode BM may also be a price not worth paying: the BM advantage increases with the length of the string being searched for.

    This is definitely a situation where benchmarking with some real-world application-specific examples will be needed before incorporating that Unicode BM into my own applications.

    Edit: some basic benchmarking shows that I was right to be wary of the "Unicode Tuned Boyer-Moore" solution. In my environment, UTBM results in bigger code, longer time. I might consider using it if I needed some of the extras this implementation provides (handling surrogates and whole-words only searches).

    0 讨论(0)
  • 2021-02-02 01:13

    I think, converting to upper or lower case before Pos is the best way, but you should try to call AnsiUpperCase/AnsiLowerCase functions as less as possible.

    0 讨论(0)
  • 2021-02-02 01:15

    This version of my previous answer works in both D2007 and D2010.

    • In Delphi 2007 the CharUpCaseTable is 256 bytes
    • In Delphi 2010 it is 128 KB (65535*2).

    The reason is Char size. In the older version of Delphi my original code only supported the current locale character set at initialization. My InsensPosEx is about 4 times faster than your code. Certainly it is possible to go even faster, but we would lose simplicity.

    type
      TCharUpCaseTable = array [Char] of Char;
    
    var
      CharUpCaseTable: TCharUpCaseTable;
    
    procedure InitCharUpCaseTable(var Table: TCharUpCaseTable);
    var
      n: cardinal;
    begin
      for n := 0 to Length(Table) - 1 do
        Table[Char(n)] := Char(n);
      CharUpperBuff(@Table, Length(Table));
    end;
    
    function InsensPosEx(const SubStr, S: string; Offset: Integer = 1): Integer;
    var
      n:            Integer;
      SubStrLength: Integer;
      SLength:      Integer;
    label
      Fail;
    begin
      Result := 0;
      if S = '' then Exit;
      if Offset <= 0 then Exit;
    
      SubStrLength := Length(SubStr);
      SLength := Length(s);
    
      if SubStrLength > SLength then Exit;
    
      Result := Offset;
      while SubStrLength <= (SLength-Result+1) do 
      begin
        for n := 1 to SubStrLength do
          if CharUpCaseTable[SubStr[n]] <> CharUpCaseTable[s[Result+n-1]] then
            goto Fail;
          Exit;
    Fail:
        Inc(Result);
      end;
      Result := 0;
    end;
    
    //...
    
    initialization
      InitCharUpCaseTable({var}CharUpCaseTable);
    
    0 讨论(0)
  • 2021-02-02 01:15

    The built-in Delphi function to do that is in both the AnsiStrings.ContainsText for AnsiStrings and StrUtils.ContainsText for Unicode strings.

    In the background however, they use logic very similar to your logic.

    No matter in which library, functions like that will always be slow: especially to be as compatible with Unicode as possible, they need to have quite a lot of overhead. And since they are inside the loop, that costs a lot.

    The only way to circumvent that overhead, is to do those conversions outside the loop as much as possible.

    So: follow your own suggestion, and you have a really good solution.

    --jeroen

    0 讨论(0)
  • 2021-02-02 01:19

    Why not just convert the both the substring and the source string to lower or upper case within the regular Pos statement. The result will effectively be case-insensitive because both arguments are all in one case. Simple and lite.

    0 讨论(0)
提交回复
热议问题