What is the best algorithm for matching two string containing less than 10 words in latin script

前端 未结 5 1820
刺人心
刺人心 2021-02-04 09:35

I\'m comparing song titles, using Latin script (although not always), my aim is an algorithm that gives a high score if the two song titles seem to be the same same title and a

5条回答
  •  别跟我提以往
    2021-02-04 10:08

    Each algorithm is going to focus on a similar, but slightly different aspect of the two strings. Honestly, it depends entirely on what you are trying to accomplish. You say that the algorithm needs to understand words, but should it also understand interactions between those words? If not, you can just break up each string according to spaces, and compare each word in the first string to each word in the second. If they share a word, the commonality factor of the two strings would need to increase.

    In this way, you could create your own algorithm that focused only on what you were concerned with. If you want to test another algorithm that someone else made, you can find examples online and run your data through to see how accurate the estimated commonality is with each.

    I think http://jtmt.sourceforge.net/ would be a good place to start.

提交回复
热议问题