How can find similarity between sentences? [closed]

依然范特西╮ 提交于 2021-02-11 14:54:40

问题


I'm trying to find similarities between both the sentences in a shell script.

Have a two sentences containing duplicate words, for example, the input data in file my_text.txt

Shell Script.
Linux Shell Script.
  • The intersection of both sentences: Shell + Script

  • The union " size " of both sentences: 3

The correct output for similarity of sentences :

 0.30000000000000000000

The definition of the similarity ** is the intersection of words between the two sentences divided by the size of the union of the two sentences.

The problem: I have tried a lot to found a shell script, but I have not found a solution to this problem.


回答1:


The following script should do the trick. It also ignores duplicated words per sentence, filler words, and non-alphabetical characters as described by you in the comment section.

words=$(
  < my_text.txt tr 'A-Z' 'a-z' |
  grep -Eon '\b[a-z]*\b' |
  grep -Fwvf <(printf %s\\n is a to be by the and for) |
  sort -u | cut -d: -f2 | sort
)
union=$(uniq <<< "$words" | wc -l)
intersection=$(uniq -d <<< "$words" | wc -l)
echo "similarity is $(bc -l <<< "$intersection/$union")"

The output for your example input is .30000000000000000000 (= 0.3).




回答2:


Is this what you're trying to do (using GNU awk for FPAT and arrays of arrays)?

$ cat tst.awk
BEGIN {
    split("is a to be by the and for",tmp)
    for (i in tmp) {
        stopwords[tmp[i]]
    }
    FPAT="[[:alnum:]_]+"
}
{
    for (i=1; i<=NF; i++) {
        word = tolower($i)
        if ( !(word in stopwords) ) {
            words[NR][word]
        }
    }
}
END {
    for (word in words[1]) {
        if (word in words[2]) {
            numCommon++
        }
    }
    totWords = length(words[1]) + length(words[2]) - numCommon
    print (totWords ? numCommon / totWords : 0)
}

$ awk -f tst.awk file
0.666667


来源:https://stackoverflow.com/questions/65365496/how-can-find-similarity-between-sentences

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!