Find closest match to input string in a list of strings

后端 未结 2 1059

I have problems finding an implementation of closest match strings for .net

I would like to match a list of strings, example:

input string: \"Publiczna Szkoła Po

相关标签:
2条回答
  • 2021-02-04 03:05

    .NET does not supply anything out of the box - you need to implement a an Edit Distance algorithm yourself. For example, you can use Levenshtein Distance, like this:

    // This code is an implementation of the pseudocode from the Wikipedia,
    // showing a naive implementation.
    // You should research an algorithm with better space complexity.
    public static int LevenshteinDistance(string s, string t) {
        int n = s.Length;
        int m = t.Length;
        int[,] d = new int[n + 1, m + 1];
        if (n == 0) {
            return m;
        }
        if (m == 0) {
            return n;
        }
        for (int i = 0; i <= n; d[i, 0] = i++)
            ;
        for (int j = 0; j <= m; d[0, j] = j++)
           ;
        for (int i = 1; i <= n; i++) {
            for (int j = 1; j <= m; j++) {
                int cost = (t[j - 1] == s[i - 1]) ? 0 : 1;
                d[i, j] = Math.Min(
                    Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
                    d[i - 1, j - 1] + cost);
            }
        }
        return d[n, m];
    }
    

    Call LevenshteinDistance(targetString, possible[i]) for each i, then pick the string possible[i] for which LevenshteinDistance returns the smallest value.

    0 讨论(0)
  • 2021-02-04 03:17

    Edit distance

    Edit distance is a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other.

    Levenshtein distance

    Informally, the Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other.

    Fast, memory efficient Levenshtein algorithm

    C# Levenshtein

    using System;
    
    /// <summary>
    /// Contains approximate string matching
    /// </summary>
    static class LevenshteinDistance
    {
        /// <summary>
        /// Compute the distance between two strings.
        /// </summary>
        public static int Compute(string s, string t)
        {
        int n = s.Length;
        int m = t.Length;
        int[,] d = new int[n + 1, m + 1];
    
        // Step 1
        if (n == 0)
        {
            return m;
        }
    
        if (m == 0)
        {
            return n;
        }
    
        // Step 2
        for (int i = 0; i <= n; d[i, 0] = i++)
        {
        }
    
        for (int j = 0; j <= m; d[0, j] = j++)
        {
        }
    
        // Step 3
        for (int i = 1; i <= n; i++)
        {
            //Step 4
            for (int j = 1; j <= m; j++)
            {
            // Step 5
            int cost = (t[j - 1] == s[i - 1]) ? 0 : 1;
    
            // Step 6
            d[i, j] = Math.Min(
                Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
                d[i - 1, j - 1] + cost);
            }
        }
        // Step 7
        return d[n, m];
        }
    }
    
    class Program
    {
        static void Main()
        {
        Console.WriteLine(LevenshteinDistance.Compute("aunt", "ant"));
        Console.WriteLine(LevenshteinDistance.Compute("Sam", "Samantha"));
        Console.WriteLine(LevenshteinDistance.Compute("flomax", "volmax"));
        }
    }
    
    0 讨论(0)
提交回复
热议问题