php : word proximity script?

前端 未结 3 869
臣服心动
臣服心动 2021-01-27 05:21

Okay - so, I\'ve spent ages searching in Google, and even went through a few specific searches at hotscripts etc., several php forums and this place ... nothing (not of use anyw

相关标签:
3条回答
  • 2021-01-27 05:57

    your example searched Word1 ... Word2, should Word2 ... Word1 also be matched? A simple solution is to use RegEx:

    i.e.:

    1. use regex: \bWord1\b(.*)\bWord2\b
    2. in the first match group, use space (or whatever boundary) to split it into an array, and count

    this is the most straight forward method, but definitely not the best (i.e. performance wise) method. I think you need to clarify your needs if you want a more specific answer.

    Update:

    After the 2 questions are merged, I see other answers mentioning soundex, levinstein and hamming distance etc. I would suggest theclueless1 to CLARIFY the requirements so that people can give useful help. If this is an application related to searching or document clustering, I also suggest you to take a look at mature full text indexing/searching solutions such as sphinx or lucene. I think any of them can be used with PHP.

    0 讨论(0)
  • 2021-01-27 05:58

    If you are speaking about specific word comparisons, you will want to look at the SOUNDEX function of MySQL. (I will assume you may be using mysql). When comparing two words, you can get a reference to how they sound:

    SELECT `word` FROM `list_of_words` WHERE SOUNDEX(`word`) = SOUNDEX('{TEST_WORD}');
    

    Then when you get your list of words (as most likely you will get quite a few), you cna check the distance between those words for the word that is CLOSEST (or the group of words depending on how you write your code).

    $word = '{WORD TO CHECK}';
    $distance = 4; // the smalled the distance the closed the word
    foreach($word_results as $comparison_word) {
       $distance = levenshtein($comparison_word, $word);
       if($distance < $threshold) {
          $threshold = $distance;
          $similar_word = $comparison_word;
       }
    }
    echo $similar_word;
    

    Hope that helps you find the direction you are looking for.

    Happy coding!

    0 讨论(0)
  • 2021-01-27 06:16

    I also thought of Hamming distance as Felix Kling commented. Maybe you can make some variant, where you encode your words into specific codewords and then check their distances through an array that holds your codewords.

    So if you have array[11, 02, 85, 37, 11], you can easily find that 11 has a maximum distance of 4 in this array.

    Don't know if this would work for you, but i think i would do it in a similar manner.

    0 讨论(0)
提交回复
热议问题