Convert numbered pinyin to pinyin with tone marks

前端 未结 7 1153
春和景丽
春和景丽 2021-02-04 02:40

Are there any scripts, libraries, or programs using Python, or BASH tools (e.g. awk, perl, sed) which can correc

7条回答
  •  暖寄归人
    2021-02-04 02:43

    I wrote another Python function that does this, which is case insensitive and preserves spaces, punctuation and other text (unless there are false positives, of course):

    # -*- coding: utf-8 -*-
    import re
    
    pinyinToneMarks = {
        u'a': u'āáǎà', u'e': u'ēéěè', u'i': u'īíǐì',
        u'o': u'ōóǒò', u'u': u'ūúǔù', u'ü': u'ǖǘǚǜ',
        u'A': u'ĀÁǍÀ', u'E': u'ĒÉĚÈ', u'I': u'ĪÍǏÌ',
        u'O': u'ŌÓǑÒ', u'U': u'ŪÚǓÙ', u'Ü': u'ǕǗǙǛ'
    }
    
    def convertPinyinCallback(m):
        tone=int(m.group(3))%5
        r=m.group(1).replace(u'v', u'ü').replace(u'V', u'Ü')
        # for multple vowels, use first one if it is a/e/o, otherwise use second one
        pos=0
        if len(r)>1 and not r[0] in 'aeoAEO':
            pos=1
        if tone != 0:
            r=r[0:pos]+pinyinToneMarks[r[pos]][tone-1]+r[pos+1:]
        return r+m.group(2)
    
    def convertPinyin(s):
        return re.sub(ur'([aeiouüvÜ]{1,3})(n?g?r?)([012345])', convertPinyinCallback, s, flags=re.IGNORECASE)
    
    print convertPinyin(u'Ni3 hao3 ma0?')
    

提交回复
热议问题