What is the third parameter to Text::JaroWinkler::strcmp95 for?

随声附和 提交于 2019-12-12 02:25:45

问题


I am interested in the Jaro-Winkler module written in Perl to compute the distance (or similarity) between two strings:

http://search.cpan.org/~scw/Text-JaroWinkler-0.1/JaroWinkler.pm

The syntax of the function is not clear to me; I could not find any clear documentation of it.

Here is the sample code:

#!/usr/bin/perl

use 5.10.0;
use Text::JaroWinkler qw( strcmp95 );
print strcmp95("it is a dog","i am a dog.",11);

What exactly does the 11 represent? I gather it is a length. Which length? The length of the amount of characters I want checked? Is it required to be there?


回答1:


See the source for an answer to your question. It contains this line:

$ying = sprintf("%*.*s", -$y_length, $y_length, $ying);

So $y_length is being used to reformat the strings, padding them if necessary and trimming them to an identical length. These equal-length strings are then fed into the actual comparison function. This suggests that Alex is correct and giving a length of max(length $ying, length $yang) is going to give the best results under most circumstances.

Reading the source also reveals that if you fail to supply $y_length, no default is supplied. So you'll be comparing the empty string to the empty string. Those should have a pretty short JW distance.



来源:https://stackoverflow.com/questions/15015280/what-is-the-third-parameter-to-textjarowinklerstrcmp95-for

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!