Convert chinese characters to hanyu pinyin

前端 未结 3 908
死守一世寂寞
死守一世寂寞 2021-02-10 02:30

How to convert from chinese characters to hanyu pinyin?

E.g.

你 --> Nǐ

马 --> Mǎ


More Inf

相关标签:
3条回答
  • 2021-02-10 03:14

    In Python try

    from cjklib.characterlookup import CharacterLookup
    cjk = CharacterLookup('C')
    cjk.getReadingForCharacter(u'北', 'Pinyin')
    

    You would get

    ['běi', 'bèi']
    

    Disclaimer: I'm the author of that library.

    0 讨论(0)
  • 2021-02-10 03:23

    For Java, I'd try the pinyin4j library

    0 讨论(0)
  • 2021-02-10 03:27

    The problem of converting hanzi to pinyin is a fairly difficult one. There are many hanzi characters which have multiple pinyin representations, depending on context. Compare 长大 (pinyin: zhang da) to 长城 (pinyin: chang cheng). For this reason, single-character conversion is often actually useless, unless you have a system that outputs multiple possibilities. There is also the issue of word segmentation, which can affect the pinyin representation as well. Though perhaps you already knew this, I thought it was important to say this.

    That said, the Adso Package contains both a segmenter and a probabilistic pinyin annotator, based on the excellent Adso library. It takes a while to get used to though, and may be much larger than you are looking for (I have found in the past that it was a bit too bulky for my needs). Additionally, there doesn't appear to be a public API anywhere, and its C++ ...

    For a recent project, because I was working with place names, I simply used the Google Translate API (specifically, the unofficial java port, which, for common nouns at least, usually does a good job of translating to pinyin. The problem is commonly-used alternative transliteration systems, such as "HongKong" for what should be "XiangGang". Given all of this, Google Translate is pretty limited, but it offers a start. I hadn't heard of pinyin4j before, but after playing with it just now, I have found that it is less than optimal--while it outputs a list of potential candidate pinyin romanizations it makes no attempt to statistically determine their likelihood. There is a method to return a single representation, but it will soon be phased out, as it currently only returns the first romanization, not the most likely. Where the program seems to do well is with conversion between romanizations and general configurability.

    In short then, the answer may be either any one of these, depending on what you need. Idiosyncratic proper nouns? Google Translate. In need of statistics? Adso. Willing to accept candidate lists without context information? Pinyin4j.

    0 讨论(0)
提交回复
热议问题