stringdist

R fuzzy string match to return specific column based on matched string

我怕爱的太早我们不能终老 提交于 2019-11-26 21:58:09
问题 I have two large datasets, one around half a million records and the other one around 70K. These datasets have address. I want to match if any of the address in the smaller data set are present in the large one. As you would imagine address can be written in different ways and in different cases / spellings etc. Apart from this address can be duplicated if written only till the building level. So different flats have the same address. I did some research and figured out the package stringdist