问题
consider examples below :
Example 1 :
str1 = "wow...it looks amazing" str2 = "looks amazi"
You see that
amazi
is close toamazing
,str2
is mistyped, i wanted to write a program that will tell me thatamazi
is close toamazing
then instr2
i will replaceamazi
withamazing
Example 2 :
str1 = "is looking good" str2 = "looks goo"
In this case updated
str2
will be"looking good"
Example 3 :
str1 = "you are really looking good" str2 = "lok goo"
In this case
str2
will be"good"
aslok
is not close tolooking
(or even if program can convert in this caselok
tolooking
then it's just fine for my problem's solution)Example 4 :
str1 = "Stu is actually SEVERLY sunburnt....it hurts!!!" str2 = "hurts!!"
Updated
str2
will be"hurts!!!"
Example 5 :
str1 = "you guys were absolutely amazing tonight, a..." str2 = "ly amazin"
Updated
str2
will be"amazing"
,"ly"
shall be removed or replace by absolutely.
What will be the algo and code for this?
Maybe we can do it by looking at character lexicographically and set a
threshold like 0.8 or 80% so if word2
gets 80% sequential characters of word1
from str1
then we replace word2
in str2
with word of str1
?
Any other efficient solution with python code please?
回答1:
There are a lot of ways to approach this. This one solves all of your examples. I added a minimum similarity filter to return only the higher quality matches. This is what allows the 'ly' to be dropped in the last sample, as it is not all that close any any of the words.
Documentation
You can install levenshtein with pip install python-Levenshtein
import Levenshtein
def find_match(str1,str2):
min_similarity = .75
output = []
results = [[Levenshtein.jaro_winkler(x,y) for x in str1.split()] for y in str2.split()]
for x in results:
if max(x) >= min_similarity:
output.append(str1.split()[x.index(max(x))])
return output
Each sample you proposed.
find_match("is looking good", "looks goo")
['looking','good']
find_match("you are really looking good", "lok goo")
['looking','good']
find_match("Stu is actually SEVERLY sunburnt....it hurts!!!", "hurts!!")
['hurts!!!']
find_match("you guys were absolutely amazing tonight, a...", "ly amazin")
['amazing']
回答2:
Like this:
str1 = "wow...it looks amazing"
str2 = "looks amazi"
str3 = []
# Checking for similar strings in both strings:
for n in str1.split():
for m in str2.split():
if m in n:
str3.append(n)
# If found 2 similar strings:
if len(str3) == 2:
# If their indexes align:
if str1.split().index(str3[1]) - str1.split().index(str3[0]) == 1:
print(' '.join(str3))
elif len(str3) == 1:
print(str3[0])
Output:
looks amazing
UPDATE with condition given by the OP:
str1 = "good..."
str2 = "god.."
str3 = []
# Checking for similar strings in both strings:
for n in str1.split():
for m in str2.split():
# Calculating matching character in the 2 words:
c = ''
for i in m:
if i in n:
c+=i
# If the amount of matching characters is greater or equal to 50% the length of the larger word
# or the smaller word is in the larger word:
if len(list(c)) >= len(n)*0.50 or m in n:
str3.append(n)
# If found 2 similar strings:
if len(str3) == 2:
# If their indexes align:
if str1.split().index(str3[1]) - str1.split().index(str3[0]) == 1:
print(' '.join(str3))
elif len(str3) == 1:
print(str3[0])
回答3:
I made through it with regular expressions
def check_regex(str1,str2):
#New list to store the updated value
str_new = []
for i in str2:
# regular expression for comparing the strings
x = ['['+i+']','^'+i,i+'$','('+i+')']
for k in x:
h=0
for j in str1:
#Conditions to make sure the word is close enough to the particular word
if "".join(re.findall(k,j)) == i or ("".join(re.findall(k,j)) in i and abs(len("".join(re.findall(k,j)))-len(i)) == 1 and len(i)!=2):
str_new.append(j)
h=1
break
if h==1:
break
return str_new
import re
str1 = input().split()
str2 = input().split()
print(" ".join(check_regex(str1,str2)))
回答4:
You can use Jacard coefficient in this case. First, you need to split your first and second string by space. After that, for every string in str2, take Jacard coefficient with every string in str1, then replace with which that gives you the highest Jacard coefficient.
You can use sklearn.metrics.jaccard_score
.
来源:https://stackoverflow.com/questions/62106645/what-is-efficient-way-to-check-if-current-word-is-close-to-a-word-in-string