Measure the pronounceability of a word?

前端 未结 3 1432
闹比i
闹比i 2020-12-05 04:33

I\'m tinkering with a domain name finder and want to favour those words which are easy to pronounce.

Example: nameoic.com (bad) versus namelet.com (good).

Wa

相关标签:
3条回答
  • 2020-12-05 04:55

    I think the problem could be boiled down to parsing the word into a candidate set of phonemes, then using a predetermined list of phoneme pairs to determine how pronouncible the word is.

    For example: "skill" phonetically is "/s/k/i/l/". "/s/k/", "/k/i/", "/i/l/" should all have high scores of pronouncibility, so the word should score highly.

    "skpit" phonetically is "/s/k/p/i/t/". "/k/p/" should have a low pronouncibility score, so the word should score low.

    0 讨论(0)
  • 2020-12-05 04:57

    Here is a function which should work with the most common of words... It should give you a nice result between 1 (perfect pronounceability according to the rules) to 0.

    The following function far from perfect (it doesn't quite like words like Tsunami [0.857]). But it should be fairly easy to tweak for your needs.

    <?php
    // Score: 1
    echo pronounceability('namelet') . "\n";
    
    // Score: 0.71428571428571
    echo pronounceability('nameoic') . "\n";
    
    function pronounceability($word) {
        static $vowels = array
            (
            'a',
            'e',
            'i',
            'o',
            'u',
            'y'
            );
    
        static $composites = array
            (
            'mm',
            'll',
            'th',
            'ing'
            );
    
        if (!is_string($word)) return false;
    
        // Remove non letters and put in lowercase
        $word = preg_replace('/[^a-z]/i', '', $word);
        $word = strtolower($word);
    
        // Special case
        if ($word == 'a') return 1;
    
        $len = strlen($word);
    
        // Let's not parse an empty string
        if ($len == 0) return 0;
    
        $score = 0;
        $pos = 0;
    
        while ($pos < $len) {
            // Check if is allowed composites
            foreach ($composites as $comp) {
                $complen = strlen($comp);
    
                if (($pos + $complen) < $len) {
                    $check = substr($word, $pos, $complen);
    
                    if ($check == $comp) {
                        $score += $complen;
                        $pos += $complen;
                        continue 2;
                    }
                }
            }
    
            // Is it a vowel? If so, check if previous wasn't a vowel too.
            if (in_array($word[$pos], $vowels)) {
                if (($pos - 1) >= 0 && !in_array($word[$pos - 1], $vowels)) {
                    $score += 1;
                    $pos += 1;
                    continue;
                }
            } else { // Not a vowel, check if next one is, or if is end of word
                if (($pos + 1) < $len && in_array($word[$pos + 1], $vowels)) {
                    $score += 2;
                    $pos += 2;
                    continue;
                } elseif (($pos + 1) == $len) {
                    $score += 1;
                    break;
                }
            }
    
            $pos += 1;
        }
    
        return $score / $len;
    }
    
    0 讨论(0)
  • 2020-12-05 05:00

    Use a Markov model (on letters, not words, of course). The probability of a word is a pretty good proxy for ease of pronunciation. You'll have to normalize for length, since longer words are inherently less probable.

    0 讨论(0)
提交回复
热议问题