Is there an edit distance algorithm that takes “chunk transposition” into account?

后端 未结 6 2016
被撕碎了的回忆
被撕碎了的回忆 2021-02-04 14:54

I put \"chunk transposition\" in quotes because I don\'t know whether or what the technical term should be. Just knowing if there is a technical term for the process would be ve

6条回答
  •  太阳男子
    2021-02-04 15:13

    In the case of your application you should probably think about adapting some algorithms from bioinformatics.

    For example you could firstly unify your strings by making sure, that all separators are spaces or anything else you like, such that you would compare "Alan Turing" with "Turing Alan". And then split one of the strings and do an exact string matching algorithm ( like the Horspool-Algorithm ) with the pieces against the other string, counting the number of matching substrings.

    If you would like to find matches that are merely similar but not equal, something along the lines of a local alignment might be more suitable since it provides a score that describes the similarity, but the referenced Smith-Waterman-Algorithm is probably a bit overkill for your application and not even the best local alignment algorithm available.

    Depending on your programming environment there is a possibility that an implementation is already available. I personally have worked with SeqAn lately, which is a bioinformatics library for C++ and definitely provides the desired functionality.

    Well, that was a rather abstract answer, but I hope it points you in the right direction, but sadly it doesn't provide you with a simple formula to solve your problem.

提交回复
热议问题