Identifying near duplicate entries using synonyms in R
问题 I am trying to identify near duplicate entries of names in a database. I am new to databases, however i am familiar with R. I can get clusters of near duplicates using fuzzy matching and soundex in R. However there are several names which are synonyms of each other. I would like to cluster the names based on this criteria along with the above ones. I want to do as suggested in Techniques for finding near duplicate records but with synonyms. I understand there is a sort of database of synonyms