What is the replacement for Language Analysis framework's Morpheme analysis deprecated APIs

a 夏天 提交于 2020-01-07 04:20:32

问题


The Language Analysis framework is deprecated and its not even available in 64-bit. The documentation says - use CFStringTokenizer but the tokenizer doesn't provide functionalities available in lang analysis framework.

What is the replacement for morpheme analysis APIs that lang analysis framework provided?

EDIT: Though Pantong's reply helped but it doesn't work in all cases, e.g. for words with 3-4 kanji characters it returns incorrect result. (By incorrect I mean its not same as what it returned by Lang analysis framework API for same string).

a) 現人神 is converted to latin - 'gen ren shen' and in hiragana- 'げんじんしん' whereas it should be - in latin - 'Arahitogami ' and in hiragana- 'あらひとがみ'

b) 安本丹 is converted to latin - 'an ben dan' and in hiragana- 'やすもとまこと' whereas it should be - in latin as - 'Yasumoto makoto ' and in hiragana- 'あんぽんたん'


回答1:


One feature the deprecated morpheme analysis APIs has is "getting rudy text for Japanese/Chinese text". If you asking the replacement for that particular feature, then the following code is an example. However, I don't know about the replacement for other features in morpheme analysis APIs.

CFStringRef testString = CFSTR("のちに検知されたトークンの範囲用として使用");

CFStringTokenizerRef tokenizer = CFStringTokenizerCreate(kCFAllocatorDefault,
                                                         testString,
                                                         CFRangeMake(0, CFStringGetLength(testString)),
                                                         kCFStringTokenizerUnitWordBoundary,
                                                         CFLocaleCreate(kCFAllocatorDefault, CFSTR("Japanese")));
do
{
    if (CFStringTokenizerAdvanceToNextToken(tokenizer) == kCFStringTokenizerTokenNone) {
        break;
    }

    CFStringRef originalToken = CFStringCreateWithSubstring(kCFAllocatorDefault,
                                                            testString,
                                                            CFStringTokenizerGetCurrentTokenRange(tokenizer));

    // Get Latin transcription from the Japanese text
    CFMutableStringRef convertedToken = (CFMutableStringRef)CFStringTokenizerCopyCurrentTokenAttribute(tokenizer,
                                                                            kCFStringTokenizerAttributeLatinTranscription);
    NSLog(@"token: %@ -> latin: %@", originalToken, convertedToken);

    // Get kana from Latin transcription
    CFStringTransform(convertedToken, NULL, kCFStringTransformLatinHiragana, false);
    NSLog(@"token: %@ -> latin: %@", originalToken, convertedToken);
}
while (true);


来源:https://stackoverflow.com/questions/15339643/what-is-the-replacement-for-language-analysis-frameworks-morpheme-analysis-depr

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!