Python: find a series of Chinese characters within a string and apply a function

前端 未结 4 1965
遥遥无期
遥遥无期 2021-01-14 06:55

I\'ve got a series of text that is mostly English, but contains some phrases with Chinese characters. Here\'s two examples:

s1 = \"You say: 你好. I say: 再見\"
s         


        
4条回答
  •  傲寒
    傲寒 (楼主)
    2021-01-14 07:36

    A possible solution is to capture everything, but in different capture groups, so you can differentiate later if they're in Chinese or not.

    ret = re.findall(ur'([\u4e00-\u9fff]+)|([^\u4e00-\u9fff]+)', utf_line)
    result = []
    for match in ret:
        if match[0]:
            result.append(translate(match[0]))
        else:
            result.append(match[1])
    
    print(''.join(result))
    

提交回复
热议问题