Perl module for text comparison

问题

Can anyone suggest a Perl module which can compare two strings and return a degree to which they match? I searched CPAN extensively, and although there are similar modules like String::Approx and Data::Compare, they are not what I am looking for. Suppose I have two strings : I love you, and I boht you. I want functionality which will compare these two strings, taking into account numerous parameters, the matching of words in correct order (love as the first word in a string should not "match" love as the 4th word in the 2nd string, even though both strings have that word), words not matching but spelt almost similarly (like say love and loge), number of words, etc and return an index, say a number from 0 to 1 on a scale of 1, representing the degree of similarity between the two strings. Is there any such Perl module?

回答1:

There are many such modules. Often, though, you'll have to make use of them in some special way to account for your own assumptions. Most of the string comparison tools like this just implement some algorithm for comparing one string to another. Most assume that if you have specific policy decisions to make, you'll code them yourself.

Personally, I am not sure I'd recommend Text::Levenshtein because of bugs and lack of ut8 support. I don't have a better recommendation either, though.

However, these searches will reveal lots of potential modules you could look into and determine what works best for your purpose (based on the names of common algorithms for doing this sort of thing):

https://metacpan.org/search?q=levenshtein
https://metacpan.org/search?q=wagner+fischer
https://metacpan.org/search?q=edit+distance

If you're interested in spoken similarities, you can also look into phonetic comparisons:

https://metacpan.org/search?q=phonetic
https://metacpan.org/search?q=soundex
https://metacpan.org/search?q=metaphone

来源：https://stackoverflow.com/questions/11763875/perl-module-for-text-comparison

标签

perl

cpan