I\'ve had some success comparing strings using the PHP levenshtein function.
However, for two strings which contain substrings that have swapped positions, the algorithm
Use N-grams, which support multiple-character transpositions across the whole text.
The general idea is that you split the two strings in question into all the possible 2-3 character substrings (n-grams) and treat the number of shared n-grams between the two strings as their similarity metric. This can be then normalized by dividing the shared number by the total number of n-grams in the longer string. This is trivial to calculate, but fairly powerful.
For the example sentences:
A. The quick brown fox
B. brown quick The fox
C. The quiet swine flu
A and B share 18 2-grams
A and C share only 8 2-grams
out of 20 total possible.
This has been discussed in more detail in the Gravano et al. paper.
A not so trivial alternative, but grounded in information theory would be to use term term frequency–inverse document frequency (tf-idf) to weigh the tokens, construct sentence vectors and then use cosine similarity as the similarity metric.
The algorithm is:
Regarding other answers. Damerau–Levenshtein modificication supports only the transposition of two adjacent characters. Metaphone was designed to match words that sound the same and not for similarity matching.