发表新帖

发表新帖

Best machine learning technique for matching product strings

前端未结

关注

 3  1431

说谎 2020-12-23 12:50

Here\'s a puzzle...

I have two databases of the same 50000+ electronic products and I want to match products in one database to those in the other. However, the prod

3条回答

礼貌的吻别 (楼主)

2020-12-23 13:52

I don't know that much about machine learning, but I do know Levenshtein distance is not the best approach for this type of problem.

I am working on an extremely similar problem currently, and have found much more accurate matches using Largest Consecutive Sub Sequence (https://www.geeksforgeeks.org/longest-consecutive-subsequence).

You may also find Longest Common Substring helpful too (https://www.geeksforgeeks.org/longest-common-substring-dp-29/).

... Or maybe even a combination of both!

Levenshtein is not great because it allows for substitutions, which can easily discount similar strings which have extra characters. For example, "Hello AAAAAA", "Hello", and "BBBBB".

"Hello" and "BBBBB" are closer by Levenshtein distance, even though you would probably like "Hello" to match with "Hello AAAAAA".

LCS and LSS do not allow substitutions, so with both of these methods, "Hello" would match with "Hello AAAAAA".

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题