问题
I am interested in the Jaro-Winkler module written in Perl to compute the distance (or similarity) between two strings:
http://search.cpan.org/~scw/Text-JaroWinkler-0.1/JaroWinkler.pm
The syntax of the function is not clear to me; I could not find any clear documentation of it.
Here is the sample code:
#!/usr/bin/perl
use 5.10.0;
use Text::JaroWinkler qw( strcmp95 );
print strcmp95("it is a dog","i am a dog.",11);
What exactly does the 11 represent? I gather it is a length. Which length? The length of the amount of characters I want checked? Is it required to be there?
回答1:
See the source for an answer to your question. It contains this line:
$ying = sprintf("%*.*s", -$y_length, $y_length, $ying);
So $y_length
is being used to reformat the strings, padding them if necessary and trimming them to an identical length. These equal-length strings are then fed into the actual comparison function. This suggests that Alex is correct and giving a length of max(length $ying, length $yang)
is going to give the best results under most circumstances.
Reading the source also reveals that if you fail to supply $y_length
, no default is supplied. So you'll be comparing the empty string to the empty string. Those should have a pretty short JW distance.
来源:https://stackoverflow.com/questions/15015280/what-is-the-third-parameter-to-textjarowinklerstrcmp95-for