Arranging letters in the most pronounceable way?

前端 未结 2 623
误落风尘
误落风尘 2021-02-05 13:22

I have a string with some characters, and I\'m looking for the organization of those characters such that it\'s the most pronounceable possible.

For example, if I have

2条回答
  •  轻奢々
    轻奢々 (楼主)
    2021-02-05 14:03

    (For completeness, here's my original pure Python solution that inspired me to try machine learning.)

    I agree a reliable solution would require a sophisticated model of the English language, but maybe we can come up with a simple heuristic that's tolerably bad.

    I can think of two basic rules satisfied by most pronouncable words:

    1. contain a vowel sound
    2. no more than two consonant sounds in succession
    

    As a regular expression this can be written c?c?(v+cc?)*v*

    Now a simplistic attempt to identify sounds from spelling:

    vowels = "a e i o u y".split()
    consonants = "b bl br c ch cr chr cl ck d dr f fl g gl gr h j k l ll m n p ph pl pr q r s sc sch sh sl sp st t th thr tr v w wr x y z".split()
    

    Then it's possible to the rules with regular expressions:

    v = "({0})".format("|".join(vowels))
    c = "({0})".format("|".join(consonants))
    
    import re
    pattern = re.compile("^{1}?{1}?({0}+{1}{1}?)*{0}*$".format(v, c))
    def test(w):
        return re.search(pattern, w)
    
    def predict(words):
        return ["word" if test(w) else "scrambled" for w in words]
    

    This scores about 74% on the word/scrambled test set.

                 precision    recall  f1-score   support
    
      scrambled       0.90      0.57      0.70     52403
           word       0.69      0.93      0.79     52940
    
    avg / total       0.79      0.75      0.74    105343
    

    A tweaked version scored 80%.

提交回复
热议问题