Matching incorrectly spelt words with correct ones in python

后端 未结 5 888
佛祖请我去吃肉
佛祖请我去吃肉 2021-01-01 05:36

I\'m building an app that gets incoming SMSs, then based on a keyword, it looks to see if that keyword is associated with any campaigns that it is running. The way I\'m doin

5条回答
  •  时光说笑
    2021-01-01 06:32

    You could use a fuzzy matching and a named list with regex library e.g., to find any phrase from a list with at most one error (insertion, deletion, substitution):

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    import regex as re # pip install regex
    
    words = ["first word", "second word", "third"]
    sms = u"junk Furst Word second Third"
    
    for m in re.finditer(ur"(?fie)\L{e<=1}", sms, words=words):
        print(m[0]) # the match
        print(m.span()) # return indexes where the match found in the sms
        # to find out which of the words matched:
        print(next(w for w in words
                   if re.match(ur"(?fi)(?:%s){e<=1}" % re.escape(w), m[0])))
    

    Output

    Furst Word
    (5, 14)
    first word
    Third
    (22, 27)
    third
    

    Or you could iterate over the words directly:

    for w in words:
        for m in re.finditer(ur"(?fie)(?:%s){e<=1}" % re.escape(w), sms):
            print(m[0])
            print(m.span())
            print(w)
    

    It produces the same output as the first example.

提交回复
热议问题