Wildcard string matching

后端 未结 9 1464
醉话见心
醉话见心 2021-01-02 01:54

What is the most efficient wildcard string matching algorithm? I am asking only about an idea, it is not necessary to provide actual code.

I\'m thinking that such al

相关标签:
9条回答
  • 2021-01-02 02:40

    This is a simple implementation in C that makes use of pointers and attempts to do a single traverse of the string, and wildcard pattern where possible to gain the most efficiency in the average simple case. Note it also avoids any functions that you would use for strings like "length()" (may differ depending on the language) that may traverse the string and add unwanted computation time.

    #include <stdio.h>
    
    _Bool wildcard_strcmp(char *line, char *pattern)
    {
        _Bool wildcard = 0;
        char *placeholder;
    
        do
        {
            if ((*pattern == *line) || (*pattern == '?'))
            {
                line++;
                pattern++;
            }
            else if (*pattern == '*')
            {
                if (*(++pattern) == '\0')
                {
                    return 1;
                }
                wildcard = 1;
            }
            else if (wildcard)
            {
                if (pattern == placeholder)
                {
                    line++;
                }
                else
                {
                    pattern = placeholder;
                }
            } 
            else
            {
                return 0;
            }
        } while (*line);
    
        if (*pattern == '\0')
        {
            return 1;
        }
        else
        {
            return 0;
        }
    }
    
    int main()
    {
        char string[200] = "foobarfoobar";
        char pattern[200] = "fo?*barfoo*";
    
        if (wildcard_strcmp(string, pattern))
        {
            printf("Match\n");
        }
        else
        {
            printf("No Match\n");
        }
    
        return 0;
    }
    
    0 讨论(0)
  • 2021-01-02 02:44

    The performance will not just depend on the length of the string to search but also on the number (and type) of wildcards in the query string. If you are allowed to use a * which matches any number of characters, up to and including the entire document, and you can have any number of stars, this will put some limits on what is possible to get.

    If you can determine a match some string foo in O(f(n)) time, then the query foo_0*foo_1*foo_2*...*foo_m will take O(m*f(n)) time where m is the number of * wildcards.

    0 讨论(0)
  • 2021-01-02 02:46

    You could convert your wildcard query into a regular expression and use that to match; an RE can always be transformed into a DFA (discrete finite automaton) and those are efficient (lineair time) and a small constant.

    0 讨论(0)
提交回复
热议问题