How to convert Chinese characters to Pinyin

后端 未结 8 927
渐次进展
渐次进展 2020-12-23 15:10

For sorting Chinese language text, I want to convert Chinese characters to Pinyin, properly separating each Chinese character and grouping successive characters together.

8条回答
  •  礼貌的吻别
    2020-12-23 15:44

    While @JUST MY correct OPINION's answer addresses some of the difficulties of converting characters into pinyin, it is not an impossible problem to solve.

    I have written a library (pinyinify) that solves this task with decent accuracy. Even though there is not a one-to-one mapping between characters and pinyin, my library can usually decide which pronunciation is correct. For example, "我受不了了" correctly converts to "wǒ shòubùliǎo le", with two different pronunciations of 了.

    My approach to solving the problem is pretty simple:

    • First segment the text into words. For example, 我喜欢旅游 would be divided into three words: 我 喜欢 旅游. This is also not a simple process, but there are many libraries for it. jieba is one of the more popular libraries for this purpose.
    • Use a dictionary to convert the words into pinyin.
    • If the word is not in the dictionary, fall back to converting the individual characters to pinyin using their most common pronunciation.

提交回复
热议问题