C# code or algorithm to quickly calculate distance between large strings? [closed]

十年热恋 提交于 2019-12-10 17:44:34

问题


Hi and thanks for looking!

Background

I have an XML file that contains 1900 nodes which themselves contain a string of encoded data of about 3400 characters.

As part of a use case for an application I am developing, I need to be able to take a "benchmark" string at runtime, and find the closest match from the XML file.

Please note that XML is not germane to the app and that I may go with SQL moving forward, but for today, I just needed an easy place to store the data and prove the concept.

I am using .NET 4.0, C#, forms app, LINQ, etc.

Question

How do I find the closest match? Hamming? Levenshtein? There are plenty of code samples online, but most are geared towards small string comparisons ("ant" vs. "aunt") or exact match. I will rarely have exact matches; I just need closest match.

Thanks in advance!

Matt


回答1:


You mentioned using Levenhstein's Edit Distance and that your strings were about 3400 characters long.

I did a quick try and using the dynamic programming version of Levenhstein's Edit Distance it seems to be quite fast and cause no issue.

I did this:

        final StringBuilder sb1 = new StringBuilder();
        final StringBuilder sb2 = new StringBuilder();
        final Random r = new Random(42);
        final int n = 3400;
        for (int i = 0; i < n; i++) {
            sb1.append( (char) ('a' + r.nextInt(26)) );
            sb2.append( (char) ('a' + r.nextInt(26)) );
        }
        final long t0 = System.currentTimeMillis();
        System.out.println("LED: " + getLevenshteinDistance(sb1.toString(), sb2.toString()) );
        final long te = System.currentTimeMillis() - t0;
        System.out.println("Took: " + te + " ms");

And it's finding the distance in 215 ms on a Core 2 Duo from 2006 or so.

Would that work for you?

(btw I'm not sure I can paste the code for the DP LED implementation I've got here so you probably should search the Internet for one Java implementation)



来源:https://stackoverflow.com/questions/8780215/c-sharp-code-or-algorithm-to-quickly-calculate-distance-between-large-strings

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!