Arranging letters in the most pronounceable way?

前端未结

关注

 2  623

误落风尘 2021-02-05 13:22

I have a string with some characters, and I\'m looking for the organization of those characters such that it\'s the most pronounceable possible.

For example, if I have

2条回答

轻奢々 (楼主)

2021-02-05 14:03

(For completeness, here's my original pure Python solution that inspired me to try machine learning.)

I agree a reliable solution would require a sophisticated model of the English language, but maybe we can come up with a simple heuristic that's tolerably bad.

I can think of two basic rules satisfied by most pronouncable words:

1. contain a vowel sound
2. no more than two consonant sounds in succession

As a regular expression this can be written c?c?(v+cc?)*v*

Now a simplistic attempt to identify sounds from spelling:

vowels = "a e i o u y".split()
consonants = "b bl br c ch cr chr cl ck d dr f fl g gl gr h j k l ll m n p ph pl pr q r s sc sch sh sl sp st t th thr tr v w wr x y z".split()

Then it's possible to the rules with regular expressions:

v = "({0})".format("|".join(vowels))
c = "({0})".format("|".join(consonants))

import re
pattern = re.compile("^{1}?{1}?({0}+{1}{1}?)*{0}*$".format(v, c))
def test(w):
    return re.search(pattern, w)

def predict(words):
    return ["word" if test(w) else "scrambled" for w in words]

This scores about 74% on the word/scrambled test set.

             precision    recall  f1-score   support

  scrambled       0.90      0.57      0.70     52403
       word       0.69      0.93      0.79     52940

avg / total       0.79      0.75      0.74    105343

A tweaked version scored 80%.

0 讨论(0)

查看其它2个回答