Perl module for text comparison

泪湿孤枕 提交于 2019-12-10 14:38:51

问题


Can anyone suggest a Perl module which can compare two strings and return a degree to which they match? I searched CPAN extensively, and although there are similar modules like String::Approx and Data::Compare, they are not what I am looking for. Suppose I have two strings : I love you, and I boht you. I want functionality which will compare these two strings, taking into account numerous parameters, the matching of words in correct order (love as the first word in a string should not "match" love as the 4th word in the 2nd string, even though both strings have that word), words not matching but spelt almost similarly (like say love and loge), number of words, etc and return an index, say a number from 0 to 1 on a scale of 1, representing the degree of similarity between the two strings. Is there any such Perl module?


回答1:


There are many such modules. Often, though, you'll have to make use of them in some special way to account for your own assumptions. Most of the string comparison tools like this just implement some algorithm for comparing one string to another. Most assume that if you have specific policy decisions to make, you'll code them yourself.

Personally, I am not sure I'd recommend Text::Levenshtein because of bugs and lack of ut8 support. I don't have a better recommendation either, though.

However, these searches will reveal lots of potential modules you could look into and determine what works best for your purpose (based on the names of common algorithms for doing this sort of thing):

  • https://metacpan.org/search?q=levenshtein
  • https://metacpan.org/search?q=wagner+fischer
  • https://metacpan.org/search?q=edit+distance

If you're interested in spoken similarities, you can also look into phonetic comparisons:

  • https://metacpan.org/search?q=phonetic
  • https://metacpan.org/search?q=soundex
  • https://metacpan.org/search?q=metaphone


来源:https://stackoverflow.com/questions/11763875/perl-module-for-text-comparison

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!