A good SQL strategy for fuzzy matching possible duplicates using SQL Server 2005

走远了吗. 提交于 2019-12-06 12:24:10

It will of course depend on your exact requirements, but using CONTAINS in your SQL gives you the ability to carry out proximity searches, as well as thematic and fuzzy searches.

http://www.developer.com/db/article.php/3446891/Understanding-SQL-Server-Full-Text-Indexing.htm

http://msdn.microsoft.com/en-us/library/ms187787(SQL.90).aspx

I would recommend using an SSIS task to periodically clean up the data. SSIS has fuzzy matching operators, and there are third party providers that offer more powerfull components. Some articles on the topic:

If the budget permits and the size of operation is worth it, you can even consider an MDS server: SQL Server 2008 R2 Master Data Services.

Also a new SSIS Data Quality Toolkit is available at http://www.melissadata.com/dqt/total-data-quality-integration.htm

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!