How do I find duplicate addresses in a database, or better stop people already when filling in the form ? I guess the earlier the better?
Is there any good way of abstra
Machine learning and AI has algorithms to find string similarities and duplicate measures.
Record linkage or the task of matching equivalent records that differ syntactically—was first explored in the late 1950s and 1960s.
You can represent every pair of records using a vector of features that describe the similarity between individual record fields.
For example, Adaptive Duplicate Detection Using Learnable String Similarity Measures. for example, read this doc
You can use generic or manually tuned distance metrics for estimating the similarity of potential duplicates.
You can use adaptive name matching algorithms, like Jaro metric, which is based on the number and order of common characters between two strings.
Token-based and hybrid distance. In such cases, we can convert the strings s and t to token multisets (where each token is a word) and consider similarity metrics on these multisets.